Tasks - Video Text Reading Competition for Dense and Small Text
The Challenge will support two tasks
- Text Localisation, where the objective is to localize and track all words in the video sequences.
- End to End, where the objective is to localize, track and recognize all words in the video sequence
Task1 - Video Text Tracking
The task requires one network to detect and track text over the video sequence simultaneously. Given an input video, the method should produce two results: a rotated detection box, and tracking ids of the same text over the whole video sequence.
All the videos will be provided as MP4 files.
For convenience, follow the format of ICDAR2015 Text in Video competition , the ground truth will be provided as a single XML file per video. The format of the ground truth file will follow the structure of the example below:
---->Point x="97" y="382"
-->object Transcription="910" ID="1002" Language=”English” Category="Scene Text"
---->Point x="607" y="305"
---->Point x="98" y="384" />
---->Point x="609" y="307" />
Transcription="transcription" ID="num_id" Language="language" Category = "title/caption/scene text">represents each of the objects (words) in the frame.
- Transcription is the textual transcription of the word (If the transcription is ###, representing the quality of the text is low and unreadable. During the evaluation, the such text will not be taken into account: a method will not be penalized if it does not detect these words, while a method that detects them will not get any better score.)
- ID is a unique identifier of an object; all occurrences of the same object have the same ID.
- Language defines the language the word is written in.
- Category defines which class the word belongs to (title/caption/scene text).
If no objects exist in a particular frame the frame tag is created empty.
Participants are required to automatically localize the words in the images and return affine bounding boxes in the same XML format. In the XML format of the users, only the ID attribute is expected for each object, any other attributes will be ignored.
A single compressed (zip or rar) file should be submitted containing all the result files for all the videos of the test set. In the case that your method fails to produce any results for a particular video, you should include no XML file for that particular video.
For simplicity, we adopt the evaluation method from the ICDAR2015 Text in Video competition . The evaluation is based on an adaptation of the MOTChallenge  for multiple object tracking. For each method, MOTChallenge provides three different metrics: the Multiple Object Tracking Precision (MOTP), the Multiple Object Tracking Accuracy (MOTA), and the IDF1. See the 2013 competition report  and MOTChallenge  for details about these metrics.
Task2 - End-to-End Video Text Spotting
Video Text Spotting (VTS) task that requires simultaneously detecting, tracking, and recognizing text in the video.
Therefore, based on Task 1 text tracking, the participator needs to provide the corresponding recognization result for each text track id.
Different from ICDAR2015 and ICDAR2013, we do not provide the vocabulary of all words in the training and testing set, the recognization results are open-set. And we only focus on English and Alphanumeric, other language text in the video will be annotated as ignored (transcription as ###).
The evaluation metric is the same as the ICDAR2015 Text in Video competition . The word recognition performance is evaluated by simply whether a word recognition result is completely correct. And the word recognition evaluation is case-insensitive and accent-insensitive. All non-alphanumeric characters are not taken into account, including decimal points, such as '1.9' will be transferred to ’19‘ in our GT.
For each video, the participants are required to submit an XML file in the same format as Task 1, and an additional text file containing word recognition results, which present the following format:
where the first field contains IDs and the second field corresponds to recognition results.
 Karatzas, Dimosthenis, Lluis Gomez-Bigorda, Anguelos Nicolaou, Suman Ghosh, Andrew Bagdanov, Masakazu Iwamura, Jiri Matas et al. "ICDAR 2015 competition on robust reading." In 2015 13th international conference on document analysis and recognition (ICDAR), pp. 1156-1160. IEEE, 2015.
 D. Karatzas, F. Shafait, S. Uchida, M. Iwamura, L. Gomez, S. Robles, J. Mas, D. Fernandez, J. Almazan, L.P. de las Heras , "ICDAR 2013 Robust Reading Competition", In Proc. 12th International Conference of Document Analysis and Recognition, 2013, IEEE CPS, pp. 1115-1124
 Dendorfer, Patrick, Hamid Rezatofighi, Anton Milan, Javen Shi, Daniel Cremers, Ian Reid, Stefan Roth, Konrad Schindler, and Laura Leal-Taixe. "CVPR19 tracking and detection challenge: How crowded can it get?." arXiv preprint arXiv:1906.04567 (2019).
20st December to 30st December, 2022
i) Q&A period for the competition,
ii) The launching of the initial website
2st February 2023
i) Sample training videos available,
ii) Evaluation protocol, file formats etc. available.
15st February 2023
i) Competition kicks off officially,
ii) Release of training set videos and ground truth (50 videos).
15th March 2023:
i) Test set is available (50 videos)
ii) Website opens for results submission.
20th March 2023
I) Deadline of the competition and result submission closes(at PDT 23: 59)
31th March 2023
Submission deadline for 1 page competition report, and the final ranking will be released after results checking.
21th to 26th August , 2023
Announcement of competition results at ICDAR2023.