Tasks - ICDAR 2019 Robust Reading Challenge on Reading Chinese Text on Signboard

ReCTS dataset includes 25,000 labeled images that are collected in the wild by phone cameras under uncontrolled conditions. It mainly focuses on Chinese text on the restaurant signboards.

The dataset is split into a training set and a test set. The training set consists of 20,000 images, and the test set consists of 5,000 images. Four tasks are introduced: (1) character recognition, (2) text line recognition , (3)text line detection and (4)end-to-end text spotting.

Every image in the dataset is annotated with text line locations, character locations and the transcripts of text lines and characters. Locations are annotated in terms of polygons with four vertices, which are in clockwise order starting from the upper left vertice. Transcripts are UTF-8 encoded strings.

EXTERNAL DATA: Publicly, freely available external data is permitted. The source of any external data must be mentioned in the final short descriptions of the participating methods.

Note that the half-width character and its corresponding full-width character are regarded as one character when evaluating. Here we give a list of the the half-width character and its corresponding full-width character in a file /rrc.cvc.uab.es/files/half_width_full_width_dict.zip. Also the English letters are not case sensitive in the evaluation of task2 and task4.

We do not provide any feedback for the submissions during the challenge. Each team is allowed to submit 5 results at most and we will choose the best result among the 5 results as the final result. Besides, each participant must provide the real name and organization. After the team information is submitted, it cannot be revised any more.

Ground Truth Format

For each image, we use a json file named [img_name].json to store the ground-truths in a structured format as follows:

{

“chars”: [

{“points”: [x1,y1,x2,y2,x3,y3,x4,y4], “transcription” : “trans1”, "ignore":0 },

{“points”: [x1,y1,x2,y2,x3,y3,x4,y4], “transcription” : “trans2”, " ignore ":0 }],

“lines”: [

{“points”: [x1,y1,x2,y2,x3,y3,x4,y4] , “transcription” : “trans3”, "ignore ":0 }],

}

where x1,y1,x2,y2,x3,y3,x4,y4 in "points" are the coordinates of the polygon bounding boxes, "chars" represent single char information and "lines" represent text line information. The “transcription” denotes the text of each text line, and “ignore” represents “Do Not Care” text region when it’s set “true”. A sample image and its corresponding ground-truth can be downloaded https://rrc.cvc.uab.es/files/ReCTS_sample_gt.zip

Ground Truth Ambiguity

In some signboard, there always exist the following case:

图片 1.png

It is difficult to determine whether the boxes "砂锅"， “炒面”，“拌面”，“烩肉”，“泡馍” should be merged to a large text box or not. Therefore, we regard the two cases (the (a) and (b)) as correct ground truth.

We will provide one or more ground truths for each test image if possible. When evaluating, we compare the predicted result with all the ground truths and use the best matched one to calculate the evaluation metrics.

Task 1. Character Recognition in the Signboard

The aim of this task is to recognize characters from the cropped character image. The input examples are shown in Figure 1.

未命名.jpg

Figure 1. The character image

Submission Format

Participants will be asked to submit a txt file containing results for all test images. The results format is:

img_name,transcription

eg. test_000001.jpg,炸

Evaluation Metrics

Accuracy = N_ok / N, where N_ok is the number of characters predicted correctly, N is the number of the test characters.

Note the test image test_ReCTS_task1_000001.jpg should be renamed test_000001.jpg in the submission txt file.

Task 2. Text Line Recognition in the Signboard

The cropped text line images and the coordinates of the polygon bounding boxes in the images are also given. The input examples are shown in Figure 2.

Figure 2. The text line image

Submission Format

Participants will be asked to submit a txt file containing results for all test images. The results format is:

img_name,transcription

eg. test_000001.jpg,炸鸡

Evaluation Metrics

We use the Normalized Edit Distance as the evaluation metric for text line recognition, which is formulated as follows:

where D stands for the Levenshtein Distance, si denotes the predicted text line and si-hat denotes the corresponding ground truth. N is the total number of text lines.

Note the test image test_ReCTS_task2_000001.jpg should be renamed test_000001.jpg in the submission txt file.

Task 3. Text Line Detection in the Signboard

The aim of this task is to localize text lines in the signboard. The input image is the full signboard images.

Submission Format

Participants will be asked to submit a txt file containing results for all test images. The results format is:

img_name

x1,y1,x2,y2,x3,y3,x4,y4

img_name

x1,y1,x2,y2,x3,y3,x4,y4

......

eg.

test_000001.jpg

457,51,699,124,697,206,452,143

test_000002.jpg

test_000003.jpg

75,202,336,249,322,315,59,270

490,311,582,311,582,345,490,345

If no text boxes are detected for test_0002.jpg, only write test_0002.jpg in the file. The points should be in clockwise order. The test image test_ReCTS_task3_and_task_4_000001.jpg should be renamed test_000001.jpg in the submission txt file.

Evaluation Metrics

Following the evaluation protocols of ICDAR 2017-RCTW [2] dataset, the detection task is evaluated in terms of Precision, Recall and F-score with IoU threshold of 0.5 and 0.7.

The F-score at IoU=0.5 will be used as the only metric for the final ranking.

All detected or missed "ignored" ground-truths will not contribute to the evaluation result.

Task 4. End-to-End Text Spotting in the Signboard

The aim of this task is to localize and recognize every text instance in the signboard. The input image is the full signboard images.

Submission Format

Participants will be asked to submit a txt file containing results for all test images. The results format is:

img_name

x1,y1,x2,y2,x3,y3,x4,y4,transcription

img_name

x1,y1,x2,y2,x3,y3,x4,y4,transcription

......

eg.

test_000001.jpg

457,51,699,124,697,206,452,143,所有锅

test_000002.jpg

test_000003.jpg

75,202,336,249,322,315,59,270,山里人

490,311,582,311,582,345,490,345,shanliren

Evaluation Metrics

First, each detection is matched to a ground-truth polygon that has the maximum IOU, or it is matched to ‘None’ if none IOU is larger than 0.5. If multiple detections are matched to the same ground-truth, only the one with the maximum IOU will be kept and the others are recorded as ‘None’.

Then, we calculate the edit distances between all matching pairs (si,si-hat). We will evaluate the predicted transcription with the Normalized Edit Distance (N.E.D), which is formulated as:

where D stands for the Levenshtein Distance, si denotes the predicted text line and si-hat denotes the corresponding ground truth. N is the total number of text lines.

Reference

[1] MSRA-500：C. Yao, X. Bai, W. Liu, Y. Ma, Z. Tu. Detecting Texts of Arbitrary Orientations in Natural Images. CVPR, 2012.

[2] RCTW：Shi B , Yao C , Liao M , et al. ICDAR2017 Competition on Reading Chinese Text in the Wild (RCTW-17)[J]. 2017.

[3] SCUT-CTW1500：Yuliang L , Lianwen J , Shuaitao Z , et al. Detecting Curve Text in the Wild: New Dataset and New Solution[J]. 2017.

[4] CTW：Yuan T L , Zhu Z , Xu K , et al. Chinese Text in the Wild[J]. 2018.

Challenge News

Important Dates

1 March 2019: Web site ready and Registration open

18 March 2019: Training set available

12 April 2019: The first part of test set available

20 April 2019: The second part of test set available, and Website opens for result submission

30 April 2019: Submission of results deadline

10 May 2019: Deadline for providing short descriptions of the participating methods

20-25 September 2019: Results presentation