method: CNN based method 72017-07-02

Authors: Yash Patel, Michal Bušta, Lukáš Neumann, Jiri Matas

Description: A CNN-based approach is used for script- identification in cropped word images. The convolutional lay- ers from VGG-16 architecture are used along with a Global- Average-Pooling and two fully connected layers. To preserve the aspect ratio of input images in both training and testing, the images are resized into fixed-height (64) and variable-width tensors. For training, the convolutional layers are initialized with ImageNet weights. The categorical-cross-entropy loss is utilized, and all the layers (both convolutional and fully connected) are updated during back-propagation.

method: BLCT2017-07-02

Authors: Jan Zdenek, Hideki Nakayama

Description: A CNN is combined with the bag-of- visual-words approach. A patch-based approach is adopted to solve the issue of variable sizes and aspect ratios of the input images. Individual local patches extracted from training image data are used to train the CNN with 6 convolutional layers. Feature vectors of all patches from each training image are fed to the trained CNN and the output is extracted from the penultimate layer of the network. Random combinations
of feature vectors are created to form local convolutional triplets and the 3 vectors in each triplet are added. The local convolutional triplets are used to create a bag-of-visual-words vocabulary with the size of 1024 codewords. Each image is then represented as a vector of codewords which are then aggregated into histograms of occurrences. The histograms are used for global representation of each image. An MLP with two hidden layers and a “Dropout” after each layer is used for the final classification.

Ranking Table

Description Paper Source Code
DateMethodScript classification accuracy
2017-07-02CNN based method 788.09%
2017-07-02BLCT86.34%

Ranking Graphic