- Task 2 - Script identification - Method: Tencent-DPPR Team (Method_v0.2)
- Method info
- Samples list
- Per sample details
method: Tencent-DPPR Team (Method_v0.2)2019-05-27
Authors: Sicong Liu, Haoxi Li, Haibo Qin, Ben Xu, Chunchao Guo, Longhuang Wu, Shangxuan Tian, Hongfa Wang, Hongkai Chen, Qinglin lu, Chun Yang, Xucheng Yin, Lei Xiao
Description: We are from Tencent-DPPR (Data Platform Precision Recommendation) Team. We first recognize text lines and their character-level language types using ensemble results of several recognition models, which based on CTC/Seq2Seq and CNN with self-attention/RNN. After that, we identify the language types of recognized results based on statics of MLT-2019 and Wikipedia corpus.
Confusion Matrix
Detection | ||||||||||
---|---|---|---|---|---|---|---|---|---|---|
Arabic | Latin | Chinese | Japanese | Korean | Bangla | Hindi | Symbols | None | ||
GT | Arabic | 4912 | 156 | 18 | 16 | 10 | 5 | 8 | 17 | 0 |
Latin | 639 | 58414 | 432 | 346 | 196 | 110 | 221 | 279 | 0 | |
Chinese | 12 | 78 | 3970 | 622 | 20 | 7 | 38 | 3 | 0 | |
Japanese | 145 | 1049 | 1420 | 5225 | 120 | 35 | 117 | 46 | 0 | |
Korean | 292 | 1347 | 579 | 286 | 9959 | 89 | 391 | 49 | 0 | |
Bangla | 9 | 52 | 9 | 8 | 14 | 2380 | 70 | 3 | 0 | |
Hindi | 5 | 44 | 0 | 4 | 2 | 14 | 4154 | 1 | 0 | |
Symbols | 46 | 495 | 40 | 134 | 17 | 4 | 10 | 3269 | 0 | |
None | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 |