Tasks - ICDAR 2023 Competition on Text-based Video Question Answering on News Videos
Task
The objective of this task to answer questions asked on a news video by reading textual content present in it. The news videos are sourced from multiple news channels across YouTube. The duration of each video is 10 seconds. The answers to the questions are of ~2-3 text words. The annotation file contains "question_id", "video_id", "timestamp", "question", "answer". We also provide OCR information and respective frames on evenly sampled frames of the videos (Refer to Downloads section for more details).
Submissions Format
Results are expected to be submitted as a single JSON file (extension .json) that contains a list of dictionaries, in which there are two keys "question_id" and "answer". The "question_id" key represents the unique id of the question while the "answer" key should correspond to the model's output. As an example, the result file might be named: result_task1.json and will contain a list similar to:
[
{'answer': 'Digital currency, 'questionId': 1},
{'answer': 'Price surge', 'questionId': 2},
{'answer': 'Money laundering', 'questionId': 3},
...,
...,
]
Evaluation Metric
We will be using Average Normalized Levenshtein Similarity (ANLS) as the evaluation metric. For further details on the metric refer to Task 3 for scene text VQA challenge.
Challenge News
Important Dates
24 -31 December 2022: Initial website launch
24 - 31 December 2022: Initial training data release
16 February 2023: Full training data along with test data release
20 March 2023: Deadline for Competition submissions
10 April 2023: Initial submission of competition report
21 - 26 August 2023: Result announcement and presentation