Tasks - ICDAR 2023 Competition on Text-based Video Question Answering on News Videos


The objective of this task to answer questions asked on a news video by reading textual content present in it. The news videos are sourced from multiple news channels across YouTube. The duration of each video is 10 seconds. The answers to the questions are of ~2-3 text words. The annotation file contains "question_id", "video_id", "timestamp", "question", "answer". We also provide OCR information and respective frames on evenly sampled frames of the videos (Refer to Downloads section for more details).

Submissions Format

Results are expected to be submitted as a single JSON file (extension .json) that contains a list of dictionaries, in which there are two keys "question_id" and "answer". The "question_id" key represents the unique id of the question while the "answer" key should correspond to the model's output. As an example, the result file might be named: result_task1.json and will contain a list similar to:

    {'answer': 'Digital currency, 'questionId': 1},
    {'answer': 'Price surge', 'questionId': 2},
    {'answer': 'Money laundering', 'questionId': 3},

Evaluation Metric

We will be using Average Normalized Levenshtein Similarity (ANLS) as the evaluation metric. For further details on the metric refer to  Task 3 for scene text VQA challenge.

Challenge News

Important Dates

24 -31 December 2022: Initial website launch

24 - 31 December 2022: Initial training data release

16 February 2023: Full training data along with test data release

20 March 2023: Deadline for Competition submissions

10 April 2023: Initial submission of competition report

21 - 26 August 2023: Result announcement and presentation