- Task 2 - Script identification - Method: Tencent-DPPR Team (Method_v0.3)
- Method info
- Samples list
- Per sample details
method: Tencent-DPPR Team (Method_v0.3)2019-06-04
Authors: Sicong Liu, Haoxi Li, Haibo Qin, Ben Xu, Chunchao Guo, Longhuang Wu, Shangxuan Tian, Hongfa Wang, Hongkai Chen, Qinglin lu, Chun Yang, Xucheng Yin, Lei Xiao
Description: We are from Tencent-DPPR (Data Platform Precision Recommendation) Team. We first recognize text lines using ensemble results of several recognition models, which based on CTC/Seq2Seq and CNN with self-attention/RNN. After that, we identify the language types of recognized results based on statics of MLT-2019 and Wikipedia corpus.
Confusion Matrix
Detection | ||||||||||
---|---|---|---|---|---|---|---|---|---|---|
Arabic | Latin | Chinese | Japanese | Korean | Bangla | Hindi | Symbols | None | ||
GT | Arabic | 4999 | 101 | 7 | 12 | 4 | 4 | 5 | 10 | 0 |
Latin | 254 | 59087 | 362 | 265 | 105 | 90 | 88 | 386 | 0 | |
Chinese | 5 | 29 | 4324 | 370 | 4 | 6 | 7 | 5 | 0 | |
Japanese | 73 | 700 | 1226 | 5990 | 44 | 14 | 53 | 57 | 0 | |
Korean | 173 | 1112 | 397 | 220 | 10813 | 80 | 148 | 49 | 0 | |
Bangla | 10 | 41 | 7 | 4 | 3 | 2437 | 43 | 0 | 0 | |
Hindi | 6 | 24 | 0 | 1 | 0 | 3 | 4186 | 4 | 0 | |
Symbols | 24 | 258 | 43 | 58 | 5 | 4 | 10 | 3613 | 0 | |
None | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 |