- Task 2 - Script identification - Method: Tencent-DPPR Team (Method_v0.1)
- Method info
- Samples list
- Per sample details
method: Tencent-DPPR Team (Method_v0.1)2019-05-27
Authors: Sicong Liu, Haoxi Li, Haibo Qin, Ben Xu, Chunchao Guo, Longhuang Wu, Shangxuan Tian, Hongfa Wang, Hongkai Chen, Qinglin lu, Chun Yang, Xucheng Yin, Lei Xiao
Description: We are from Tencent-DPPR (Data Platform Precision Recommendation) Team. We first recognize text lines using ensemble results of several recognition models, which based on CTC/Seq2Seq and CNN with self-attention/RNN. After that, we identify the language types of recognized results based on statics of MLT-2019 and Wikipedia corpus.
Confusion Matrix
Detection | ||||||||||
---|---|---|---|---|---|---|---|---|---|---|
Arabic | Latin | Chinese | Japanese | Korean | Bangla | Hindi | Symbols | None | ||
GT | Arabic | 4911 | 158 | 23 | 24 | 5 | 5 | 4 | 12 | 0 |
Latin | 669 | 58233 | 507 | 420 | 149 | 125 | 235 | 299 | 0 | |
Chinese | 12 | 70 | 3981 | 626 | 15 | 7 | 36 | 3 | 0 | |
Japanese | 142 | 995 | 1456 | 5266 | 84 | 34 | 133 | 47 | 0 | |
Korean | 304 | 1369 | 650 | 345 | 9780 | 94 | 413 | 37 | 0 | |
Bangla | 9 | 55 | 9 | 9 | 4 | 2361 | 95 | 3 | 0 | |
Hindi | 4 | 40 | 0 | 4 | 2 | 6 | 4166 | 2 | 0 | |
Symbols | 43 | 469 | 47 | 168 | 12 | 6 | 13 | 3257 | 0 | |
None | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 |