Authors: Shangxuan Tian, Haoxi Li, Sicong Liu, Longhuang Wu, Chunchao Guo, Haibo Qin, Chang Liu, Hongfa Wang, Hongkai Chen, Qinglin lu, Xucheng Yin, Lei Xiao
Description: We are Tencent-DPPR (Data Platform Precision Recommendation) team. In detection stage, we use LSVT dataset to pretrain our model and provided ReCTS dataset to train the text detector. During training, we use multi-scale training policy.
Our text detector is based on two-stage method. In backbone part, we use ResNet101 as feature extractor. In FPN part, we designed a policy to help proposals select feature pyramid layers to extract features instead of choosing one layer according to box sizes.
In detection ensemble part, we apply a multi-scale test method with different backones. When ensembling all the results, we develop an approach to vote boxes after scoring each box.
In the recognition stage, we use a synthetic dataset containing more than fifty million images, as well as open-source datasets including LSVT, ReCTS, COCO-Text, RCTW, and ICPR-2018-MTWI. Our data augmentation tricks include Gaussian blur, Gaussian noise and so on. All samples are resized to the same height before feeding into the network.
Five types of deep models are used in our recognition stage, including CTC-based nets and multi-head attention based nets. For task 1, we select the character with the highest frequency among all the results. For task 2 and task 4, we also use the predicted confidence scores of cropped words and the ensemble results to select the reliable one among results predicted by all models.