method: Res_MUL_SPP_BUPT2019-06-03

Authors: Xiaoying Hou(Beijing University of Posts and Telecommunications),Qinyi Zhang(Beijing University of Posts and Telecommunications)

Description: In this task, we aim at recognize the script of the words in the image. We use classes, Arabic, Latin, Symbols, Bangla, None, Chinese, Japanese, Hindi, Korean, Mixed.We use ResNet as the baseline model. But the main problem of task2 is that the images are in various sizes. So here we adopt SPP layer before the fully-connected layer. Furthermore, we cluster the input images into 5 classes using K-means. In each batch,we choose data from each class. And we also use data augment, such as adding Gaussian noise.

Confusion Matrix

Detection
ArabicLatinChineseJapaneseKoreanBanglaHindiSymbolsNone
GTArabic37379282526116871150
Latin66552062761514814272171342230
Chinese26117817001616188163230
Japanese90297454639754673427440
Korean11632091053284756744515330
Bangla144736737152139716740
Hindi830592312306862777140
Symbols180161715364911717400
None000000000