method: Res_BUPT_22019-06-03

Authors: Xiaoying Hou(Beijing University of Posts and Telecommunications),Qinyi Zhang(Beijing University of Posts and Telecommunications)

Description: In this task, we aim at recognize the script of the words in the image. We use classes, Arabic, Latin, Symbols, Bangla, None, Chinese, Japanese, Hindi, Korean, Mixed.We use ResNet as the baseline model. But the main problem of task2 is that the images are in various sizes. So here we adopt SPP layer before the fully-connected layer. Furthermore, we cluster the input images into 5 classes using K-means. In each batch,we choose data from each class. And we also use data augment, such as adding Gaussian noise.

Confusion Matrix

Detection
ArabicLatinChineseJapaneseKoreanBanglaHindiSymbolsNone
GTArabic285316245907468590
Latin6174400414657311309369274960
Chinese3694935908648211730
Japanese98351740352251266380130
Korean1094011705611715331025860
Bangla65075793898445800
Hindi954650864557258950
Symbols118205483815322279290
None000000000