method: NXB OCR2019-06-03

Authors: Yupeng Cao(X), Qiufeng Wang*(X), Qi Qu(B), Jing Li(X), Cheng Cheng*(N), Kaizhu Huang*(X) (Equal Contribution)

Description: A CNN-based method is used for training script identification classifier in cropped word images [1]. We use VGG19 architecture as the training model. The images are resized into 32*32. For each convolutional layer, we add the batch normalization and choose max pooling as the pooling layer.

P.S.Affiliation of Authors
(X:Xi’an Jiaotong-liverpool University ;
N:Institute of Nanotechnology and Nano-Bionics, Chinese Academy of Sciences ;
B:Beijing Babel Tenchnology Co., Ltd.)

[1]. Bušta M, Patel Y, Matas J. E2E-MLT-an unconstrained end-to-end method for multi-language scene text[J]. arXiv preprint arXiv:1801.09919, 2018.

Confusion Matrix

Detection
ArabicLatinChineseJapaneseKoreanBanglaHindiSymbolsNone
GTArabic453147318543599130
Latin304578354038256621761402920
Chinese1946930551062902416150
Japanese942351128639233196672460
Korean99262751566789646332250
Bangla4224183116209615510
Hindi610959271402020
Symbols63119281743081525250
None000000000