Tasks - ICDAR 2019 Robust Reading Challenge on Scanned Receipts OCR and Information Extraction

Dataset and Annotations

The dataset will have 1000 whole scanned receipt images. Each receipt image contains around about four key text fields, such as goods name, unit price and total cost, etc. The text annotated in the dataset mainly consists of digits and English characters. An example scanned receipt is shown below:

sample21.jpg

 

The dataset is split into a training/validation set (“trainval”) and a test set (“test”). The “trainval” set consists of 600 receipt images which will be made available to the participants along with their annotations. The “test” set consists of 400 images, which will be made available a few weeks before the submission deadline.

For receipt OCR task, each image in the dataset is annotated with text bounding boxes (bbox) and the transcript of each text bbox. Locations are annotated as rectangles with four vertices, which are in clockwise order starting from the top. Annotations for an image are stored in a text file with the same file name. The annotation format is similar to that of ICDAR2015 dataset, which is shown below:

x1_1, y1_1,x2_1,y2_1,x3_1,y3_1,x4_1,y4_1, transcript_1

x1_2,y1_2,x2_2,y2_2,x3_2,y3_2,x4_2,y4_2, transcript_2

x1_3,y1_3,x2_3,y2_3,x3_3,y3_3,x4_3,y4_3, transcript_3

For the information extraction task, each image in the dataset is annotated with a json file with format shown below:

{"Vt Pep Mocha": "4.95",

"Total": "$4.95",

"date": "14/03/2015",

   ……………

}

 

Task 1 - Scanned Receipt OCR

Task Description

Localizing and recognizing text are conventional tasks that has appeared in many previous competitions, such as the ICDAR Robust Reading Competition (RRC) 2013, ICDAR RRC 2015 and ICDAR RRC 2017 [1][2]. The aim of this task is to accurately localize texts with 4 vertices and recognize the text in the localized bounding boxes. The text localization ground truth will be at least in the level of words. Participants will be asked to submit a zip file containing results for all test images.

Evaluation Protocol

As participating teams may apply localization algorithms to locate text at different levels (e.g. text lines), for the evaluation of text localizazation in this task, the methodology based on DetVal will be implemented. The methodology will address the problem of one-to-many and many to one correspondences of detected texts. In our evaluation protocol mean average precision (mAP) and average recall will be calculated, based on which F1 score will be computed and used for ranking [3]. A detected text is marked as true positive if 1) IoU with ground truth box(es) is larger than a given threshold; 2) the ground truth box has not been matched to another detection yet; and 3) the recognized text in the detected box matches the ground truth text.

 

Task 2 - Key Information Extraction from Scanned Receipts

Task Description

The aim of this task is to extract texts of a number of key fields from given receipts, and save the texts for each receipt image in a json file with format shown in Figure 3. Participants will be asked to submit a zip file containing results for all test invoice images.

Evaluation Protocol

For each test receipt image, the extracted text is compared to the ground truth. An extract text is marked as correct if both submitted content and category of the extracted text matches the groundtruth; Otherwise, marked as incorrect. The mAP is computed over all the extracted texts of all the test receipt images. F1 scored is computed based on mAP and recall. F1 score is used for ranking.

 

Important Dates

Registration open: February 10 – March 31, 2019

Training/validation dataset available: March 1, 2019

Submission open: April 15, 2019

Deadline for Competition participants: April 30, 2019