Tasks - ICDAR 2019 Robust Reading Challenge on Scene Text Visual Question Answering

The Challenge is contains three tasks, all of them are new for the 2019 edition of the competition:

  • Strongly Contextualized, where the complete list of answers per image plus some dictionary words are provided.
  • Weakly Contextualized, where the participants will have at hand the full list of possible answers for the complete dataset plus some dictionary words.
  • End-to-end, where no predefined list of possible answers is given, and the correct answer has to be generated automatically by: processing the image context, reading and understanding the textual information in the image.

Dataset and Tools

The SceneText-VQA dataset comprises 23,000 images with up to three questions/answer pairs per image. A train and test split are provided. Train set consists of 19000 images with 26000 questions while test set consists of 3000 images with 4000 questions per task. An example of the type of questions and answers to be expected is given in Figure 1.

Figure 1. A possible question/answer pair for this image might be:
(Q) Which soda brand appears in the bottom of the image? (A) Coca-Cola.

Along with the dataset, we offer a set of utility functions and scripts for the evaluation and visualisation of submitted results, both through the RRC online platform, and as stand-alone code and utilities that can be used offline (the latter provided after the competition has finished).

Task 1 - Strongly Contextualised

In this first task, the participants will be provided with a different list of possible answers for each image. The list will comprise some of the words that appear within the image, plus some extra dictionary words . As such,  each image will contain a relatively small but different set of possible answers. For the example image above, the participant would be given a list including the words below, plus some dictionary words:

[ Public, Market, Center, Coca-Cola, Farmers, Enjoy, … ]


Task 2 - Weakly Contextualised

In this task, the participants will be provided the full list of possible answers for the complete dataset and complemented with some dictionary words. Although the list of possible answers will be the same (a static list) for all the images within the dataset, the list is considerably larger than the set of answers from the previous task. The dictionary is comprised by 30,000 words formed by collecting all the 22k ground truth words plus 8k generated vocabulary.

Task 3 - Open Dictionary

The end-to-end task is the most generic and challenging one, since no set of answers is provided a priori. The submitted methods for this task should be able to generate the correct answers by analysing the image's visual context and reading and understanding all image contained textual information.

Evaluation Metric

In all the three tasks, the evaluation metric will be the Average Normalized Levenshtein distance (ANLd). The ANLd makes use of a threshold of value 0.5. This threshold dictates whether the output of the metric will be the Average Normalized Levenshtein distance if its value is equal or bigger than 0.5 or 0 otherwise.

More formally, the ANLd between the net output and the ground truth answers is given by:


It is not case sensitive, but space sensitive. For example:


  Q: What soft drink company name is on the red disk?

  Possible different answers:

  • u1: Coca Cola

  • u2: Coca Cola Company



Submission Format

The submission file should be only one file per task. It should be formatted as a JSON file that contains a list of dictionaries, in which there are two keys which are "questions_id" and "answer". The "question_id" key represents the unique id of the question while the key "answer" should be model's output. As a example, the result file might be named: result_task1.json and will contain a list similar to:


    {'answer': 'Coca', 'question_id': 1},

    {'answer': 'stop', 'question_id': 2},

    {'answer': 'delta', 'question_id': 3},







Important Dates

12 February 2019: Web site online

8 March 2019: Training set available

15 April 2019: Test set available

30 April 2019: Submission of results deadline

10 May 2019: Deadline for providing short descriptions of the participating methods

20-25 September 2019Results presentation