method: SCUT-MMOCR-KS2023-03-20

Authors: Yongxin Shi, Qing Jiang, Wocheng Xiao, Zhenghua Yang, Dezhi Peng, Chongyu Liu, Lianwen Jin, Kuikun Liu, Tong Gao, Wensong Lin, Guodong Liu, Chen Sun

Affiliation: South China University of Technology; Shanghai AI Laboratory; KingSoft Office CV R&D Department

Email: mountchicken@outlook.com

Description: We use DBNet++ for text detection. The detector is first pre-trained on a collection of TextOCR, HierText, DSText, YVT, ICDAR2015-Video, Minetto, and then fine-tuned on DSText. A ViT-based recognizer is used for text recognition. We first pre-trained the recognizer on 10M unlabelled real STR images and fine-tuned it on 4M labeled real STR images. We use the tracking module in CoText for text tracking.