Authors: Yash Patel, Michal Bušta, Lukáš Neumann, Jiri Matas
Description: A CNN-based approach is used for script- identification in cropped word images. The convolutional lay- ers from VGG-16 architecture are used along with a Global- Average-Pooling and two fully connected layers. To preserve the aspect ratio of input images in both training and testing, the images are resized into fixed-height (64) and variable-width tensors. For training, the convolutional layers are initialized with ImageNet weights. The categorical-cross-entropy loss is utilized, and all the layers (both convolutional and fully connected) are updated during back-propagation.