method: Res_SPP_BUPT2019-06-02

Authors: Xiaoying Hou(Beijing University of Posts and Telecommunications),Qinyi Zhang(Beijing University of Posts and Telecommunications)

Description: In this task, we aim at recognize the script of the words in the image. We use classes, Arabic, Latin, Symbols, Bangla, None, Chinese, Japanese, Hindi, Korean, Mixed.We use ResNet as the baseline model. But the main problem of task2 is that the images are in various sizes. So here we adopt SPP layer before the fully-connected layer. Furthermore, we cluster the input images into 5 classes using K-means. In each batch,we choose data from each class. And we also use data augment, such as adding Gaussian noise.

Confusion Matrix

Detection
ArabicLatinChineseJapaneseKoreanBanglaHindiSymbolsNone
GTArabic3288122843326165330590
Latin1196425121976106242525116906350
Chinese59104313501804412660160
Japanese136278459637137231410640
Korean18436779573542435916001130
Bangla1452284569861262080
Hindi161055878259621330120
Symbols1621240775948043018190
None000000000