Tasks - ICDAR 2019 Robust Reading Challenge on Scene Text Visual Question Answering
The Challenge is structured around three tasks, all of them new for the 2019 edition of the competition:
- Strongly Contextualized, where the complete list of words that appear in the image plus some distractors are provided.
- Weakly Contextualized, where the participants will have at hand the full list of possible answers for the complete dataset.
- End-to-end, where no predefined list of possible answers is given, and the correct answer has to be generated automatically by processing the image context and reading and understanding the textual information in the image.
Dataset and Tools
The SceneText-VQA dataset comprises over 15,000 images with at least three questions/answer pairs per image. A train, validation and test split are provided. An example of the type of questions and answers to be expected is given in Figure 1.
Along with the dataset, we offer a set of utility functions and scripts for the evaluation and visualisation of submitted results, both through the RRC online platform, and as stand-alone code and utilities that can be used offline (the latter provided after the competition has finished).
Task 1 - Strongly Contextualised
In this first task, the participants will be provided with a different list of possible answers for each image. The list will comprise some of the words that appear within the image, plus some extra words acting as distractors. As such, for each image a relatively small but different set of possible answers will be provided. For the example image above, the participant would be given a list including the words below, plus distractors:
[ Public, Market, Center, Coca-Cola, Farmers, Enjoy, … ]
Task 2 - Weakly Contextualised
In this task, the participants will be provided the full list of possible answers for the complete dataset, complemented with some distractor words. Although the list of possible answers will be the same (a static list) for all the images within the dataset, the list is considerably longer than the set of answers from the previous task.
Task 3 - End-to-end
The end-to-end task is the most generic and challenging one, since no set of answers is provided a priori. The submitted methods for this task should be able to generate the correct answers by analysing the image's visual context and reading and understanding all image contained textual information.
In all the three tasks, the evaluation metric will be based on an accuracy measure, checking in how many of the question and answer pairs the proposed methods delivered a correct answer.
New Challenges for 2019 Announced
Special Issue on Scene Text Reading and its Applications
Do NOT use qq.com emails to register or contact us
Downtime due to scheduled revisions on 26 and 27 March 2018
Downtime due to scheduled revision on 11 and 12 April 2017
12 February 2019: Web site online
28 February 2019: Training and Validation set available
15 April 2019: Test set available
30 April 2019: Submission of results deadline
10 May 2019: Deadline for providing short descriptions of the participating methods
20-25 September 2019: Results presentation