method: Res_SPP_BUPT2019-06-02
Authors: Xiaoying Hou(Beijing University of Posts and Telecommunications),Qinyi Zhang(Beijing University of Posts and Telecommunications)
Description: In this task, we aim at recognize the script of the words in the image. We use classes, Arabic, Latin, Symbols, Bangla, None, Chinese, Japanese, Hindi, Korean, Mixed.We use ResNet as the baseline model. But the main problem of task2 is that the images are in various sizes. So here we adopt SPP layer before the fully-connected layer. Furthermore, we cluster the input images into 5 classes using K-means. In each batch,we choose data from each class. And we also use data augment, such as adding Gaussian noise.
Confusion Matrix
Detection | ||||||||||
---|---|---|---|---|---|---|---|---|---|---|
Arabic | Latin | Chinese | Japanese | Korean | Bangla | Hindi | Symbols | None | ||
GT | Arabic | 3288 | 1228 | 43 | 326 | 165 | 33 | 0 | 59 | 0 |
Latin | 1196 | 42512 | 1976 | 10624 | 2525 | 1169 | 0 | 635 | 0 | |
Chinese | 59 | 1043 | 1350 | 1804 | 412 | 66 | 0 | 16 | 0 | |
Japanese | 136 | 2784 | 596 | 3713 | 723 | 141 | 0 | 64 | 0 | |
Korean | 184 | 3677 | 957 | 3542 | 4359 | 160 | 0 | 113 | 0 | |
Bangla | 14 | 522 | 84 | 569 | 86 | 1262 | 0 | 8 | 0 | |
Hindi | 16 | 1055 | 87 | 825 | 96 | 2133 | 0 | 12 | 0 | |
Symbols | 162 | 1240 | 77 | 594 | 80 | 43 | 0 | 1819 | 0 | |
None | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 |