method: TencentOCR2023-03-21

Authors: Fan Yang, Lifu Wang, Huiwen Shi, Sicong Liu, Qingxiang Lin, Yuxin Wang,Haoxi Li, Weida Chen, Yushuo Guan, Minhui Wu, Chunchao Guo, Hongfa Wang, Wei Liu

Affiliation: TencentOCR

Description: We integrated the detection results of DBNet and Cascade MaskRCNN built with multiple Backbone architectures, combined with the Parseq English recognition model for recognition, and further improved the end-to-end tracking with ByteTrack. As a result, we obtained end-to-end tracking and trajectory recognition results.

method: HTAMotr2025-02-13

Authors: Peiqi Xie

Description: A novel Half-To-All Multiple Object Tracking (HTAMotr) approach is proposed to address the challenges posed by incomplete annotation in video text tracking. Three key strategies were introduced:} rotated queries to improve anchor alignment with text regions, the Proposal-For-Groundtruth Strong Correlation (PForG) strategy to mitigate the negative effects of incomplete annotations, and an overlapping anchor filter to resolve ID switching issues. Experiments conducted on the DSText dataset demonstrate the effectiveness of HTAMotr, achieving state-of-the-art performance without requiring additional pre-training data or extensive epochs. By addressing the limitations of traditional MOTR paradigms, this work contributes to advancing video text tracking techniques and facilitating the development of more robust and efficient algorithms.

method: LOGO2024-05-30

Authors: Hongen Liu, Di Sun, Jiahao Wang, Yi Liu, Gang Pan

Affiliation: College of Intelligence and Computing, Tianjin University;Tianjin University of Science and Technology; Baidu Inc.

Description: We propose a Language Collaboration and Glyph Perception Model, termed LOGO to enhance the performance of conventional text spotters through the integration of a synergy module. To achieve this goal, a language synergy classifier (LSC) is designed to explicitly discern text instances from background noise in the recognition stage. Besides, the glyph supervision and visual position mixture module are proposed to enhance the recognition accuracy of noisy text regions, and acquire more discriminative tracking features, respectively.

Ranking Table

Description Paper Source Code
DateMethodMOTAMOTPIDF1Mostly MatchedPartially MatchedMostly Lost
2023-03-21TencentOCR62.56%79.88%75.87%811418002663
2025-02-13HTAMotr55.15%75.03%63.70%642839082241
2024-05-30LOGO51.36%77.57%65.70%574321004734
2023-03-21DA50.52%78.33%70.99%712124053051
2023-03-20TransDeTR+HRNet43.52%78.15%62.27%498022645333
2023-03-16TransDETR+HRNET(0)42.45%77.99%61.51%496122125404
2024-03-25ldswo37.38%76.08%53.24%332426496604
2023-03-20Video Text Tracking for Dense and Small Text Based on PP-YOLOE-R and Sort Algorithm36.87%79.24%48.99%212336256829
2023-03-15DA34.52%74.79%57.87%438329015293
2023-03-21solar flare31.01%78.00%50.39%236117678449
2023-03-21solar flare31.01%78.00%50.39%236117678449
2023-03-21solar flare31.01%78.00%50.39%236117678449
2023-03-21solar flare31.01%78.00%50.39%236117678449
2023-03-18Text_Localization28.92%78.46%43.96%1385118610006
2023-03-17CQUT-TransDETR27.55%78.40%44.28%158311039891
2023-03-19TextTrack25.75%74.03%50.22%330228066469
2023-03-18OCR_Video23.41%75.54%49.66%521635783783
2023-03-20seq_trans_e819.85%71.98%39.87%281533546408
2023-03-21abcmot19.84%73.82%31.18%92417659888
2023-03-21abcmot19.82%73.84%31.16%92517609892
2023-03-21abcmot19.82%73.84%31.16%92517609892
2023-03-21abcmot19.82%73.84%31.16%92517609892
2023-03-21abcmot19.82%73.84%31.16%92517609892
2023-03-21abcmot19.82%73.84%31.16%92517609892
2023-03-21abcmot19.82%73.84%31.16%92517609892
2023-03-20seq_trans_e719.35%71.83%40.67%286733436367
2023-03-19SCUT-MMOCR-KS13.83%75.75%58.41%692426223031
2023-03-15TextTrack11.32%71.50%47.46%370229575918
2023-03-16OCR_kuanguang7.49%75.62%45.68%540338353339
2023-03-15res_e80.00%0.00%0.00%
2023-03-16Feat_e120.00%0.00%0.00%
2023-03-19submit 1 : YOLOv7 + StrongSORT0.00%0.00%0.00%
2023-03-19submit 2: YOLOv7 + StrongSORT0.00%0.00%0.00%

Ranking Graphic