method: Tencent-DPPR Team2019-06-04

Authors: Sicong Liu, Haoxi Li, Haibo Qin, Ben Xu, Chunchao Guo, Longhuang Wu, Shangxuan Tian, Hongfa Wang, Hongkai Chen, Qinglin lu, Chun Yang, Xucheng Yin, Lei Xiao

Description: We are from Tencent-DPPR (Data Platform Precision Recommendation) Team. We first recognize text lines and their character-level language types using ensemble results of several recognition models, which based on CTC/Seq2Seq and CNN with self-attention/RNN. After that, we identify the language types of recognized results based on statics of MLT-2019 and Wikipedia corpus.

Confusion Matrix

Detection
ArabicLatinChineseJapaneseKoreanBanglaHindiSymbolsNone
GTArabic5003102711332110
Latin2245941327819912747383110
Chinese530440428865750
Japanese7976910496075721447520
Korean1141026299152112394684320
Bangla114474624422920
Hindi6290108417820
Symbols233093439102235960
None000000000