method: GSPA_HUST2019-05-28

Authors: Changxu Cheng, Qiuhui Huang, Wuheng Xu and Hao Wang at Huazhong University of Science and Technology

Description: We use Global Squeezer (GS) and Patch Aggregator (PA) to globally and locally extract features from the full-size cropped text images. GS is a branch consisting of GAP and a linear classifier to squeeze global features. PA make full use of local prediction to aggregate local discriminative faetures. The softermax loss is used to make intermediate supervision. In the training phase, grouping resizing is adopted to adapt the batch training where the samples in each batch must have the same size, realized by resizing the images with similar aspect ratios to the proper fixed aspect ratio. Data augmentation is also utilized to make model robust. The backbone is VGG16.
The final version.

Confusion Matrix

Detection
ArabicLatinChineseJapaneseKoreanBanglaHindiSymbolsNone
GTArabic49301587112032110
Latin3465885922736651478561910
Chinese1122739514657093140
Japanese761731112047883621838240
Korean30135820619211161162090
Bangla91388818224711610
Hindi4241312418810
Symbols62686756543531420
None000000000