method: Res_BUPT2019-06-03

Authors: Xiaoying Hou(Beijing University of Posts and Telecommunications),Qinyi Zhang(Beijing University of Posts and Telecommunications)

Description: In this task, we aim at recognize the script of the words in the image. We use classes, Arabic, Latin, Symbols, Bangla, None, Chinese, Japanese, Hindi, Korean, Mixed.We use ResNet as the baseline model. But the main problem of task2 is that the images are in various sizes. So here we adopt SPP layer before the fully-connected layer. Furthermore, we cluster the input images into 5 classes using K-means. In each batch,we choose data from each class. And we also use data augment, such as adding Gaussian noise.

Confusion Matrix

Detection
ArabicLatinChineseJapaneseKoreanBanglaHindiSymbolsNone
GTArabic30531550430255296170
Latin62043056161872111841131241420
Chinese1898236284348121540
Japanese71343842062461031855200
Korean774465668612415592737170
Bangla17636618129100824140
Hindi17632583263716223980
Symbols73190870122113212950
None000000000