Results - Video Text Reading Competition for Dense and Small Text

method: TencentOCR2023-03-21

Authors: Fan Yang, Lifu Wang, Huiwen Shi, Sicong Liu, Qingxiang Lin, Yuxin Wang，Haoxi Li, Weida Chen, Yushuo Guan, Minhui Wu, Chunchao Guo, Hongfa Wang, Wei Liu

Affiliation: TencentOCR

Description: We integrated the detection results of DBNet and Cascade MaskRCNN built with multiple Backbone architectures, combined with the Parseq English recognition model for recognition, and further improved the end-to-end tracking with ByteTrack. As a result, we obtained end-to-end tracking and trajectory recognition results.

method: HTAMotr2025-02-13

Authors: Peiqi Xie

Description: A novel Half-To-All Multiple Object Tracking (HTAMotr) approach is proposed to address the challenges posed by incomplete annotation in video text tracking. Three key strategies were introduced:} rotated queries to improve anchor alignment with text regions, the Proposal-For-Groundtruth Strong Correlation (PForG) strategy to mitigate the negative effects of incomplete annotations, and an overlapping anchor filter to resolve ID switching issues. Experiments conducted on the DSText dataset demonstrate the effectiveness of HTAMotr, achieving state-of-the-art performance without requiring additional pre-training data or extensive epochs. By addressing the limitations of traditional MOTR paradigms, this work contributes to advancing video text tracking techniques and facilitating the development of more robust and efficient algorithms.

method: LOGO2024-05-30

Authors: Hongen Liu, Di Sun, Jiahao Wang, Yi Liu, Gang Pan

Affiliation: College of Intelligence and Computing, Tianjin University；Tianjin University of Science and Technology; Baidu Inc.

Description: We propose a Language Collaboration and Glyph Perception Model, termed LOGO to enhance the performance of conventional text spotters through the integration of a synergy module. To achieve this goal, a language synergy classifier (LSC) is designed to explicitly discern text instances from background noise in the recognition stage. Besides, the glyph supervision and visual position mixture module are proposed to enhance the recognition accuracy of noisy text regions, and acquire more discriminative tracking features, respectively.

Hongen Liu, Di Sun, Jiahao Wang, Yi Liu and Gang Pan. "LOGO: Video Text Spotting with Language Collaboration and Glyph Perception Model", arXiv preprint arXiv:2405.19194 , 2024.

Ranking Table

Description Paper Source Code

Date	Method	MOTA	MOTP	IDF1	Mostly Matched	Partially Matched	Mostly Lost
2023-03-21	TencentOCR	62.56%	79.88%	75.87%	8114	1800	2663
2025-02-13	HTAMotr	55.15%	75.03%	63.70%	6428	3908	2241
2024-05-30	LOGO	51.36%	77.57%	65.70%	5743	2100	4734
2023-03-21	DA	50.52%	78.33%	70.99%	7121	2405	3051
2023-03-20	TransDeTR+HRNet	43.52%	78.15%	62.27%	4980	2264	5333
2023-03-16	TransDETR+HRNET(0)	42.45%	77.99%	61.51%	4961	2212	5404
2024-03-25	ldswo	37.38%	76.08%	53.24%	3324	2649	6604
2023-03-20	Video Text Tracking for Dense and Small Text Based on PP-YOLOE-R and Sort Algorithm	36.87%	79.24%	48.99%	2123	3625	6829
2023-03-15	DA	34.52%	74.79%	57.87%	4383	2901	5293
2023-03-21	solar flare	31.01%	78.00%	50.39%	2361	1767	8449
2023-03-21	solar flare	31.01%	78.00%	50.39%	2361	1767	8449
2023-03-21	solar flare	31.01%	78.00%	50.39%	2361	1767	8449
2023-03-21	solar flare	31.01%	78.00%	50.39%	2361	1767	8449
2023-03-18	Text_Localization	28.92%	78.46%	43.96%	1385	1186	10006
2023-03-17	CQUT-TransDETR	27.55%	78.40%	44.28%	1583	1103	9891
2023-03-19	TextTrack	25.75%	74.03%	50.22%	3302	2806	6469
2023-03-18	OCR_Video	23.41%	75.54%	49.66%	5216	3578	3783
2023-03-20	seq_trans_e8	19.85%	71.98%	39.87%	2815	3354	6408
2023-03-21	abcmot	19.84%	73.82%	31.18%	924	1765	9888
2023-03-21	abcmot	19.82%	73.84%	31.16%	925	1760	9892
2023-03-21	abcmot	19.82%	73.84%	31.16%	925	1760	9892
2023-03-21	abcmot	19.82%	73.84%	31.16%	925	1760	9892
2023-03-21	abcmot	19.82%	73.84%	31.16%	925	1760	9892
2023-03-21	abcmot	19.82%	73.84%	31.16%	925	1760	9892
2023-03-21	abcmot	19.82%	73.84%	31.16%	925	1760	9892
2023-03-20	seq_trans_e7	19.35%	71.83%	40.67%	2867	3343	6367
2023-03-19	SCUT-MMOCR-KS	13.83%	75.75%	58.41%	6924	2622	3031
2023-03-15	TextTrack	11.32%	71.50%	47.46%	3702	2957	5918
2023-03-16	OCR_kuanguang	7.49%	75.62%	45.68%	5403	3835	3339
2023-03-15	res_e8	0.00%	0.00%	0.00%
2023-03-16	Feat_e12	0.00%	0.00%	0.00%
2023-03-19	submit 1 : YOLOv7 + StrongSORT	0.00%	0.00%	0.00%
2023-03-19	submit 2: YOLOv7 + StrongSORT	0.00%	0.00%	0.00%

Inactive evaluations

method: TencentOCR2023-03-21

method: HTAMotr2025-02-13

method: LOGO2024-05-30

Ranking Table

Ranking Graphic