Authors: Sicong Liu, Haoxi Li, Haibo Qin, Chunchao Guo, Longhuang Wu, Shangxuan Tian, Hongfa Wang, Hongkai Chen, Qinglin lu, Xucheng Yin, Lei Xiao
Description: We are Tencent-DPPR (Data Platform Precision Recommendation) team. We use a synthetic dataset containing more than fifty million images, as well as open-source datasets including LSVT, ReCTS, COCO-Text, RCTW, and ICPR-2018-MTWI. Our data augmentation tricks include Gaussian blur, Gaussian noise and so on. All samples are resized to the same height before feeding into the network.
Five types of deep models are used in our recognition stage, including CTC-based nets and multi-head attention based nets. For task 1, we select the character with the highest frequency among all the results. For task 2 and task 4, we also use the predicted confidence scores of cropped words and the ensemble results to select the reliable one among results predicted by all models.