method: Tencent-DPPR Team (Method_v0.2)2019-05-27

Authors: Sicong Liu, Haoxi Li, Haibo Qin, Ben Xu, Chunchao Guo, Longhuang Wu, Shangxuan Tian, Hongfa Wang, Hongkai Chen, Qinglin lu, Chun Yang, Xucheng Yin, Lei Xiao

Description: We are from Tencent-DPPR (Data Platform Precision Recommendation) Team. We first recognize text lines and their character-level language types using ensemble results of several recognition models, which based on CTC/Seq2Seq and CNN with self-attention/RNN. After that, we identify the language types of recognized results based on statics of MLT-2019 and Wikipedia corpus.

Confusion Matrix

Detection
ArabicLatinChineseJapaneseKoreanBanglaHindiSymbolsNone
GTArabic491215618161058170
Latin639584144323461961102212790
Chinese127839706222073830
Japanese14510491420522512035117460
Korean2921347579286995989391490
Bangla952981423807030
Hindi54404214415410
Symbols46495401341741032690
None000000000