Authors: Liu Yang, Yang Fan, Lin Junyu, Tang Bin, Jin Xuan, Yuan Bo, He Yuan, Huang Longtao
Affiliation: Alibaba Artificial Intelligence Governance Research Center (AAIG)
Description: We used a regression-based text detector, a ViT-based text recognizer and a transformer-based NLP semantic correction module to complete End-to-End Text Spotting task. First, we got pre-trained models on training set including LSVT, RCTW, MLT, ArT etc. Then we fine-tuned models on ReCTS training set to obtain final models. We used single scale and no ensemble mechanism to obtain final results.