Results - RoadText Competition on Video Text Detection, Tracking and Recognition

method: ClusterFlow2023-03-28

Authors: Anthony Sherbondy, Renshen Wang

Affiliation: Google

Description: ClusterFlow is especially designed to address the problem of extracting text from videos as presented in the RoadText1k dataset. The main motivation is to demonstrate the utility of combining commodity algorithms for OCR, optical flow, clustering and classification with decision trees.

First, we use a public cloud API for extracting OCR results at line-level granularity on every image frame (~300) of each video. Next, we use a modern RAFT implementation to calculate a dense optical flow field at the pixel level for every image. The optical flow field is then used to extrude the OCR line results temporally to create tubes or tracklets of lines. Next, an unsupervised clustering algorithm is used to group the line text tracklets into clusters across the entire video. The distance metric between tracklets, clustering algorithm and hyperparameters for the clustering algorithm is searched on the training dataset.

Given the clustered tracklets, the algorithm then selects geometry and text from the tracklet to create tracked lines that have at most a single appearance within any video frame. To do this a set of features are generated from each line appearance, tracklet and cluster and input into a classification algorithm. The classification algorithm is trained to select the appearances of the cluster that would match with groundtruth in the training set. At inference the classification probabilities are used to select amongst possible line text appearances within a cluster at any video frame.

method: TH-DN2023-03-28

Authors: Ning Ding, Kemeng Zhao, Gang Yao, Pei Tang, Haodong Shi, Liangrui Peng

Affiliation: Tsinghua University

Email: dn22@mails.tsinghua.edu.cn

Description: The TH-DN method includes detection, tracking and recognition modules. For detection, YOLOX[1] with ResNet50 backbone is employed. For multi-object tracking, ByteTrack[2] is used with additional supports for low score detection boxes, which utilizes the similarities with tracklets to recover true objects and filter out the background detections. For recognition, an encoder-decoder architecture is adopted. The backbone is a variant of ResNet. The encoder is a bi-directional LSTM network, and the decoder is a Transformer module.

[1] Zheng G, Liu S, Wang F, et al. YOLOX: Exceeding YOLO Series in 2021. arXiv preprint arXiv:2107.08430, 2021. [2] Zhang Y, Sun P, Jiang Y, et al. ByteTrack: Multi-Object Tracking by Associating Every Detection Box. In ECCV, 2022: 1-21.

method: TencentOCR V52023-03-28

Authors: Author/s: Haoxi Li, Weida Chen, Huiwenshi, Sicong Liu, Fan Yang,Huiwen Shi， Lifu Wang, Qingxiang Lin，Huiwen Shi，Yuxin Wang，Mei Jiang, Jing Lv, Chunchao Guo, Hongfa Wang, Dapeng Tao, Wei Liu

Affiliation: TencentOCR

Description: We integrated the detection results of DBNet and Cascade MaskRCNN built with multiple Backbone architectures, combined with the Parseq English recognition model for recognition, and further improved the end-to-end tracking with ByteTrack. As a result, we obtained end-to-end tracking and trajectory recognition results.
roadtext_text1_3_v5.json

Ranking Table

Description Paper Source Code

Date	Method	MOTA	MOTP	IDF1	Mostly Matched	Partially Matched	Mostly Lost
2023-03-28	ClusterFlow	11.0887	69.04%	48.07%	1392	920	2668
2023-03-28	TH-DN	-4.5044	63.95%	31.65%	553	724	3700
2023-03-28	TencentOCR V5	-15.3601	50.73%	15.58%	206	412	4053
2023-03-21	TencentOCR V1	-16.8418	55.96%	16.07%	243	447	4172
2023-03-28	TH-DL	-23.1028	72.83%	37.34%	1235	737	3020
2023-03-28	TencentOCR	-23.8669	56.19%	19.71%	315	454	4102
2023-03-26	TransDETR	-28.4962	68.74%	26.87%	660	741	3589
2023-03-27	TencentOCR V4	-28.7695	64.41%	24.79%	555	862	3566
2023-03-28	roadtext-pingan	-47.4891	69.09%	27.33%	879	500	3608
2023-03-20	roadText-pingan	-52.2344	70.05%	26.33%	880	481	3607
2023-03-27	solar flare	-60.4763	60.87%	7.28%	209	651	4132
2023-03-24	RoadText DRTE	-61.3921	65.47%	12.08%	146	823	4020
2023-03-28	roadtext-pingan	-62.9938	62.78%	19.88%	625	352	4010
2023-03-20	SCUT-MMOCR-KS	-77.1913	67.83%	29.96%	1196	918	2878
2023-03-20	YBP	-102.1287	67.44%	11.64%	406	727	3859
2023-03-20	YBP	-102.1382	67.44%	11.63%	406	727	3859

Inactive evaluations

method: ClusterFlow2023-03-28

method: TH-DN2023-03-28

method: TencentOCR V52023-03-28

Ranking Table

Ranking Graphic

Ranking Graphic