Tasks - ICDAR 2019 Robust Reading Challenge on Multi-lingual scene text detection and recognition

In order to participate in the RRC-MLT-2019 challenge, you have to participate in at least one task. Here is the description of the tasks. The first three tasks are similar to the ones in RRC-MLT-2017, but they are re-opened for RRC-MLT-2019 with adding a new language to the dataset and improved quality of the ground truth for the whole dataset. We are also introducing a new forth task on End-2-End text detection and recognition.

Task-1: Multi-script text detection

In this task, a participant method should be able to generalize to detecting text of different scripts. The input to this task is scene images with embedded text in various languages, and the required detection is at word level.

Ground Truth (GT) Format

NOTE: the GT provided for this task contains more information than needed for this task, because this GT is shared with Tasks 3 and 4 as well. So, please make sure the results format generated by your method is as described in the "Results Format" paragraph.

The ground truth is provided in terms of word bounding boxes. Bounding boxes are NOT axis oriented and they are specified by the coordinates of their four corners in a clock-wise manner. For each image in the training set a corresponding UTF-8 encoded text file is provided, following the naming convention:

gt_[image name].txt

The text files are comma separated files, where each line corresponds to one text block in the image and gives its bounding box coordinates (four corners, clockwise), its script and its transcription in the format:

x1,y1,x2,y2,x3,y3,x4,y4,script,transcription

Valid scripts are: "Arabic", "Latin", "Chinese", "Japanese", "Korean", "Bangla", "Hindi", "Symbols", "Mixed", "None"

Note that the transcription is anything that follows the 9th comma until the end of line. No escape characters are to be used.

If the transcription is provided as "###", then text block (word) is considered as "don't care". Some of the "don't care" words have a script class that corresponds to a language, and others have a "None" script class. The latter case is when the word script cannot be identified due to low resolution or other distortions.

Results Format

Localisation (detection) results are expected as follows: One UTF-8 encoded text file per test image is expected. Participants are asked to submit all results in a single zip file. Result files should be named after test image IDs following the naming convention:

res_[image name].txt 

(e.g. res_1245.txt). Each line should correspond to one word in the image and provide its bounding box coordinates (four corners, clockwise) and a confidence score in the format:

x1,y1,x2,y2,x3,y3,x4,y4,confidence

Task-2: Cropped Word Script identification

The text in our dataset images appears in 10 different languages, some of them share the same script. Additionally, punctuation and some math symbols sometimes appear as separate words, those words are assigned a special script class called "Symbols". Hence, we have a total of 8 different scripts. We have excluded the words that have "Mixed" script for this task. We have also excluded all "don't care" words whether they have an identified script or not.

Ground Truth Format

For the word script identification task, we provide all the words (cropped words) in our dataset as separate image files, along with the corresponding ground-truth script and transcription. The transcription is not used in this task and can be ignored. For each text block, the axis oriented area that tightly contains the text block is provided.

The script and transcription of all words is provided in a SINGLE UTF-8 text file for the whole collection. Each line in the ground truth file has the following format

[word image name],script,transcription

Note that the transcription is anything that follows the 2nd comma until the end of line. No escape characters are to be used. Valid scripts are "Arabic", "Latin", "Chinese", "Japanese", "Korean", "Bangla", "Hindi", "Symbols".

In addition, we provide the information about the original image from which the word images have been extracted, as follows: the relative coordinates of the (non-axis oriented) bounding box that defines the text block within the cut-out text block image are provided in a separate SINGLE text file for the whole collection. The coordinates of the text blocks are given in reference to the cut-out box, as the four corners of the bounding box in a clock-wise manner. Each line in the ground truth file has the following format.

[word image name], x1, y1, x2, y2, x3, y3, x4, y4,[original image name]

Results Format

A participant method should provide the script of each image, where each input image is a cropped word image (cut-out text block from a scene image). A single script name per image is requested. All the output scripts should be listed in a single UTF-8 encoded text file, one script per word image, using the following format:

[word image name],script

Task-3: Joint text detection and script identification

This task combines all the preparation steps needed for multi-script text recognition. A participant method should take as input a full scene image, and then find the bounding boxes of all the words, and the information about each word in terms of script id.

Ground Truth Format

The ground truth is provided in the same format as in Task 1.

Results Format

Joint detection and script identification results should be provided in a single zip file. A text file per image is expected. The file should be named after the test image ID, using the following naming convention:

res_[image name].txt 

Inside each text file, a list of detected bounding boxes coordinates (four corners, clockwise), along with the confidence of the detection and the script class should be provided:

x1,y1,x2,y2,x3,y3,x4,y4,confidence,script

Task-4: End-to-End text detection and recognition

This is a very challenging task of a unified OCR for multiple-languages. The end-to-end scene text detection and recognition task in multi-language setting is coherent with its English counterparts. Given an input scene image, the objective is to localize a set of bounding boxes and their corresponding transcriptions.

Ground Truth Format

The ground truth is provided in the same format as in Task 1.

Results Format

Joint detection and recognition results should be provided in a single zip file. A text file per image is expected. The file should be named after the test image ID, using the following naming convention:

res_[image name].txt 

Inside each text file, a list of detected bounding boxes coordinates (four corners, clockwise), along with the  transcription of the detection should be provided:

x1,y1,x2,y2,x3,y3,x4,y4,transcription

 

Important Dates

15 Feb to 2 May

Manifestation of interest by participants opens

Asking/Answering questions about the details of the competition

1 Mar

Competition formal announcement

15 Mar

Website fully ready

Registration of participants continues

Evaluation protocol, file formats etc. available

15 Mar to 2 May

Train set available - training period - MLT challenge in progress -Participants evaluate their methods on the training/validation sets - Prepare for submission

Registration is still open

2 May

Registration closes for this MLT challenge for ICDAR-2019

2 May to 3 June

Test set available

3 June

Deadline for submission of results by participants

20 - 25 Sept

Announcement of results at ICDAR2019

1 Oct

The public release of the full dataset