method: Tencent-DPPR Team (Method_v0.1)2019-05-27

Authors: Sicong Liu, Haoxi Li, Haibo Qin, Ben Xu, Chunchao Guo, Longhuang Wu, Shangxuan Tian, Hongfa Wang, Hongkai Chen, Qinglin lu, Chun Yang, Xucheng Yin, Lei Xiao

Description: We are from Tencent-DPPR (Data Platform Precision Recommendation) Team. We first recognize text lines using ensemble results of several recognition models, which based on CTC/Seq2Seq and CNN with self-attention/RNN. After that, we identify the language types of recognized results based on statics of MLT-2019 and Wikipedia corpus.

Confusion Matrix

Detection
ArabicLatinChineseJapaneseKoreanBanglaHindiSymbolsNone
GTArabic49111582324554120
Latin669582335074201491252352990
Chinese127039816261573630
Japanese142995145652668434133470
Korean3041369650345978094413370
Bangla95599423619530
Hindi4400426416620
Symbols43469471681261332570
None000000000