method: Tencent-DPPR Team (Method_v0.3)2019-06-04

Authors: Sicong Liu, Haoxi Li, Haibo Qin, Ben Xu, Chunchao Guo, Longhuang Wu, Shangxuan Tian, Hongfa Wang, Hongkai Chen, Qinglin lu, Chun Yang, Xucheng Yin, Lei Xiao

Description: We are from Tencent-DPPR (Data Platform Precision Recommendation) Team. We first recognize text lines using ensemble results of several recognition models, which based on CTC/Seq2Seq and CNN with self-attention/RNN. After that, we identify the language types of recognized results based on statics of MLT-2019 and Wikipedia corpus.

Confusion Matrix

Detection
ArabicLatinChineseJapaneseKoreanBanglaHindiSymbolsNone
GTArabic4999101712445100
Latin2545908736226510590883860
Chinese529432437046750
Japanese7370012265990441453570
Korean17311123972201081380148490
Bangla104174324374300
Hindi6240103418640
Symbols242584358541036130
None000000000