Authors: Sicong Liu, Longhuang Wu, Shangxuan Tian, Haoxi Li, Chunchao Guo, Haibo Qin, Chang Liu, Hongfa Wang, Hongkai Chen, Qinglin lu, Chun Yang, Xucheng Yin, Lei Xiao
Description: We are Tencent-DPPR (Data Platform Precision Recommendation) team. Our detection method follows the framework of Mask R-CNN that employs mask to detect multi-oriented scene texts. We use the MLT-19 and the MSRA-TD500 dataset to train our text detector, and we also apply a multi-scale training approach during training. To obtain the final ensemble detection results, we combined two different backbones and different multi-scale testing approaches. Our recognition methods base on CTC/Seq2Seq and CNN with self-attention/RNN. Then cropped words are recognized using different models to obtain ensemble results.