Tasks - Occluded RoadText

For the competition we will be using the newly introduced Occluded RoadText dataset. The dataset comprises of 1,000 images. The dataset is split into 800 images in test set and 200 images in validation set. We do not provide a training set, and participants can use the synthetic occluded scene text dataset - ISTD-OC[2] dataset and any available datasets.

Each image contains at least one occluded text instance.

Dataset Details

The dataset annotations are at the line level. Each text instance falls into one of the following categories:

  • occluded_english: English text with partial occlusion
  • english: Unoccluded English text 
  • non_english: Text in languages other than English
  • illegible: Unreadable text (even for humans)

Bounding boxes and transcriptions are provided for both occluded_english and english categories. Only bounding boxes are provided for non_english and illegible categories.

 

Task 1: Text Localization

The task is to localize all text instances present within each testing image using bounding boxes. Illegible text instances are designated as "don't care" within the evaluation process. This means that a method will not be penalized if it fails to detect any illegible words while a method that does detect them will not receive a higher score.

Submission Format

[
    {
        “image_id”: the unique identifier of the image, must be a string
        “text”: [
                       {

                            “vertices”: the bounding box of the detection as a list of 4 clockwise vertices (each one being a list of two x and y coordinates) in the format [[x1, y1], [x2, y2], [x3, y3], [x4, y4]] (the vertices can be integers or floats, they will be casted to float)
                        },
                        ...
                    ]
    },
    ...
]

 

In the evaluation, we consider a detection as a match if it has an Intersection over Union above 0.5 with any ground truth bounding box.

Task 2: Single Image End-to-End Recognition

This task combines text localization and recognition for text lines in images. Recognition results need to be provided only for instances falling under the occluded_english and english categories.

Submission Format

[
    {
        “image_id”: the unique identifier of the image, must be a string
        “text”: [
                       {
                            “transcription”: the transcription of the detection (must be a string)
                            “vertices”: the bounding box of the detection as a list of 4 clockwise vertices (each one being a list of two x and y coordinates) in the format [[x1, y1], [x2, y2], [x3, y3], [x4, y4]] (the vertices can be integers or floats, they will be casted to float)
                        },
                        ...
                    ]
    },
    ...
]

 

We'll follow the evaluation guidelines set out in Wang 2011[1]. A detection will be deemed a match if it meets two criteria:

  • It overlaps with a ground truth bounding box by more than 50% (Intersection over Union, or IoU, must be greater than 0.5).
  • if word transcription exists
    • The word transcriptions are an exact match(case insensitive).

Task 3: Multi Image End-to-End Recognition

This task introduces additional data to Task 2 by providing two supplementary images of the same scene but different angles to aid in recognition. The rest of the task remains akin to Task 2. The additional images provided should not be used for Task 1 and Task 2. Detection and recognition results should only be provided for the test image and not for the supplementary images.

 

ost4-1.png ost4-2.png ost4-3.png
Figure 2: Test image(middle) and the additional images on left and right

Submission Format

Same as Task 2

The evaluation will be same as Task 2.

A custom F1 score is calculated using the overall precision and recall of occluded_english category, which will be used to rank the submissions. 
 

[val] Ground Truth Format for task 1, 2 and 3

[
    {
        “image_id”: file_name of the image in validation set: string
        “text”: [
                       {
                            “transcription”: the transcription of text, null incase of no transcription
                            “category”: the cateogry of the text instance 
                            “vertices”: the bounding box of the detection as a list of 4 clockwise vertices (each one being a list of two x and y coordinates) in the format [[x1, y1], [x2, y2], [x3, y3], [x4, y4]] (the vertices can be integers or floats, they will be casted to float)
                        },
                        ...
                    ]
    },
    ...
]

[test, val] Supplementary data for task 3

{
    "image_id":  # file_name of the image 
        [supplementary_image_1, supplementary_image_2] # file names of the two supplementary images
}

 

FAQ

Q. Do we need to provide bounding boxes for the non-English category?
A. Yes, only the illegible category is considered as "don't care" in all the tasks 

Q. In tasks 2 and 3, if the ground truth transcription is null, is it considered in the evaluation?
Yes, only the overlap with a ground truth bounding box is checked if the ground truth transcription is null.

Q. How many submissions can I make?
There is no cap on the number of submissions, but email us to let us know which submission you intend to use for the competition. Otherwise, we will consider your last submission as the competition entry.

 

References

[1] K. Wang, B. Babenko, and S. Belongie, “End-to-end scene text recognition”, in Computer Vision (ICCV), 2011 IEEE International Conference on (pp. 1457-1464), IEEE, November 2011

[2] SOARES, A. G. ; BEZERRA, BYRON L. D. ; LIMA, E. B. . How Far Deep Learning Systems for Text Detection and Recognition in Natural Scenes are Affected by Occlusion?. In: ICDAR 2021 WORKSHOP ON CAMERA-BASED DOCUMENT ANALYSIS AND RECOGNITION, 2021, Lausanne. Proceedings of the 16th International Conference on Document Analysis and Recognition, 2021.

Challenge News

Important Dates

15 January 2024: Competition Announced

19 February 2024: Validation data released

5th March 2024: Test data release

20 April 2024:  Submission site opens

10 May 2024: Deadline for competition submissions

All deadlines are in the AoE time zone