method: CNN based method 72017-07-02

Authors: Yash Patel, Michal Bušta, Lukáš Neumann, Jiri Matas

Description: A CNN-based approach is used for script- identification in cropped word images. The convolutional lay- ers from VGG-16 architecture are used along with a Global- Average-Pooling and two fully connected layers. To preserve the aspect ratio of input images in both training and testing, the images are resized into fixed-height (64) and variable-width tensors. For training, the convolutional layers are initialized with ImageNet weights. The categorical-cross-entropy loss is utilized, and all the layers (both convolutional and fully connected) are updated during back-propagation.

Confusion Matrix

Detection
ArabicLatinChineseJapaneseKoreanBanglaSymbolsMixedNone
GTArabic475130212331991600
Latin183588492455654679813000
Chinese2227135408178011900
Japanese5120399804528488502100
Korean462302436519962656700
Bangla32131925192266000
Symbols65907938338243600
Mixed000000000
None000000000