method: TH-DL2023-03-28

Authors: Gang Yao*, Ning Ding*, Kemeng Zhao, Huan Yu, Pei Tang, Haodong Shi, Liangrui Peng [*equal contribution]

Affiliation: Tsinghua University

Email: dn22@mails.tsinghua.edu.cn

Description: The TH-DL method provides an integrated scheme for text detection, recognition, and tracking in driving videos. For text detection and recognition, TESTR[1] based on Transformer is adopted. The pre-trained TESTR model is finetuned on the training set of the Roadtext Challenge. For multi-object tracking, ByteTrack[2] is employed which uses the similarities with tracklets to recover true objects from low score detection boxes. Post-processing module is added to filter duplicate instances of text detection and recognition.

[1] Zhang X, Su Y, Tripathi S, et al. Text spotting transformers. CVPR, 2022: 9519-9528.

[2] Zhang Y, Sun P, Jiang Y, et al. ByteTrack: Multi-object tracking by associating every detection box. ECCV, 2022, LNCS, vol 13682: 1-21.