Overview - Video Text Reading Competition for Dense and Small Text

Video text spotting [1] has received increasing attention due to its numerous applications in computer vision, e.g., video understanding, video retrieval, video text translation, and license plate recognition, etc. But the current video text spotting almost remain at a standstill for the lack of practical datasets and effective method

There already exist some video text spotting benchmarks ICDAR2015 (Text in Videos) [2], YouTube Video Text (YVT) [3], RoadText-1K [4], BOVText[5], which focus on common text cases (e.g., normal size, density) and single scenario while ignoring extreme video texts challenges, i.e., dense and small text in various scenarios.

This challenge focuses on dense and small text reading(DSText) challenges in the video with various scenarios.

Most existing algorithms and benchmarks focus on common text cases (e.g., normal size, density) and single scenarios, while ignoring extreme video texts challenges, i.e., dense and small text in various scenarios. In this competition, we establish a video text reading benchmark, named DSText, which focuses on dense and small text reading challenge in the video with various scenarios. Compared with the previous datasets, the proposed dataset mainly include three new challenges: 1) Dense video texts, new challenge for video text spotter. 2) High-proportioned small texts. 3) Various new scenarios, e.g., ‘Game’, ‘Sports’, etc.

Besides, similar to ICDAR2015 for video text spotting challenge , DSText also presents some common technological challenges for video text. For example, the quality of the image is generally worse than static images, due to motion blur and out of focus issues, while video compression might create further artefacts. And how to take advantage of the useful temporal information in the video for effective video text spotting also remain an unsolved challenge.

To reduce the cost (GPU computational expense) of algorithm research in the community, we select and annotate various short sequences with around 15 seconds, which include massive text (around 23.5 texts per frame) in 11 open real-life scenarios and an ”Unknown” scenario.

Some visualization for images and video demo can be found as follows:

Figure 1. Visualization for "Indoor Street View" scenario in DSText

Video Demo on YouTube for DSText

New: The competition report can be found here. If you have any questions, feel free to contact me, thanks!

References

[1] Yin, Xu-Cheng, Ze-Yu Zuo, Shu Tian, and Cheng-Lin Liu. "Text detection, tracking and recognition in video: a comprehensive survey." IEEE Transactions on Image Processing 25, no. 6 (2016): 2752-2773.

[2] Karatzas, Dimosthenis, Lluis Gomez-Bigorda, Anguelos Nicolaou, Suman Ghosh, Andrew Bagdanov, Masakazu Iwamura, Jiri Matas et al. "ICDAR 2015 competition on robust reading." In 2015 13th international conference on document analysis and recognition (ICDAR), pp. 1156-1160. IEEE, 2015.

[3] Nguyen, Phuc Xuan, Kai Wang, and Serge Belongie. "Video text detection and recognition: Dataset and benchmark." In IEEE winter conference on applications of computer vision, pp. 776-783. IEEE, 2014.

[4] Reddy, Sangeeth, Minesh Mathew, Lluis Gomez, Marçal Rusinol, Dimosthenis Karatzas, and C. V. Jawahar. "Roadtext-1k: Text detection & recognition dataset for driving videos." In 2020 IEEE International Conference on Robotics and Automation (ICRA), pp. 11074-11080. IEEE, 2020.

[5] Wu, Weijia, Yuanqiang Cai, Debing Zhang, Sibo Wang, Zhuang Li, Jiahong Li, Yejun Tang, and Hong Zhou. "A bilingual, OpenWorld video text dataset and end-to-end video text spotter with transformer." arXiv preprint arXiv:2112.04888 (2021).

[6] Weijia Wu, Yuzhong Zhao, Zhuang Li, Jiahong Li, Mike Zheng Shou, Umapada Pal, Dimosthenis Karatzas, Xiang Bai. "ICDAR 2023 Video Text Reading Competition for Dense and Small Text." arxiv.org/abs/2304.04376 (2023.

Challenge News

Important Dates

20st December to 30st December, 2022

i) Q&A period for the competition,

ii) The launching of the initial website

2st February 2023

i) Sample training videos available,

ii) Evaluation protocol, file formats etc. available.

15st February 2023

i) Competition kicks off officially,

ii) Release of training set videos and ground truth (50 videos).

15th March 2023:

i) Test set is available (50 videos)

ii) Website opens for results submission.

20th March 2023

I) Deadline of the competition and result submission closes(at PDT 23: 59)

31th March 2023

Submission deadline for 1 page competition report, and the final ranking will be released after results checking.

21th to 26th August , 2023

Announcement of competition results at ICDAR2023.