Tasks - ICDAR 2017 Challenge on Text Extraction from Biomedical Literature Figures

DeTEXT is collected from PubMed Central. It is composed of 500 typical biomedical literature figures appearing in about 300 full-text articles randomly selected. This dataset is divided into three separate non-overlapping subsets: training, validation and testing. Details are shown in Table 1 below.  

Table 1. Descriptions of training, validation and testing sets in DeTEXT, where full-text articles are randomly selected from PubMed Central.

 

NO. of figures

NO. of articles

Remarks

Training set

100

100

Select one figure for each article

Validation set

100

45

Randomly select articles and then include all common figures in these articles, until 100 is reached.

Testing set

300

143

Randomly select articles and then include all common figures in these articles, until 300 figures are selected

More details can be referred to our Dataset paper.

Note that , in the testing set, every text region is annotated as a single word; if it is not clear, the region is marked as "Don't Care". In the training and validation sets, most text regions are annotated as word regions; however, because of the time schedule, in a few part of the training and validation sets, one text region may include several words in a line.

The challenge is set up around three tasks:

  • Text Localisation, where the objective is to obtain a rough estimation of the text areas in the image, in terms of bounding boxes (with its location and orientation information with four vertices, i.e., Left-top, top-right, right-bottom, and bottom-left points) that correspond to words.

  • Cropped Word Recognition, where the locations (bounding boxes) of words in the image are assumed to be known and the corresponding text transcriptions are sought.

  • End-to-End Recognition, where the objective is to localise and recognise all words in the image in a single step.

Task 1: Text Localization (Text Detection)

The aim of this task is to accurately localise text by text block bounding boxes. Participants will be asked to run their systems to localise every text block on every testing image.

Ground Truth Format

The ground truth is provided in terms of text block bounding boxes. Bounding boxes are NOT axis oriented and they are specified by the coordinates of their four corners in a clock-wise manner. For each image in the training set a corresponding UTF-8 encoded text file is provided, following the naming convention:

gt_[image name].txt

The text files are comma separated files, where each line corresponds to one text block in the image and gives its bounding box coordinates (four corners, clockwise) and its transcription in the format:

x1, y1, x2, y2, x3, y3, x4, y4, transcription

Note that the transcription is anything that follows the 8th comma until the end of line. No escape characters are to be used.

Results Format

Localisation results are expected in a similar format as the ground truth. One UTF-8 encoded text file per test image is expected. Participants will be asked to submit all results in a single zip file. Result files should be named after test image IDs following the naming convention:

res_[image name].txt 

(e.g. res_1245.txt). Each line should correspond to one text block in the image and provide its bounding box coordinates (four corners, clockwise) and a confidence scrore in the format:

x1, y1, x2, y2, x3, y3, x4, y4, confidence

Participants will be asked to include a confidence score for every bounding box (with four corner points). Note that points are listed in clockwise order.

Task 2: Cropped Text Blocks Recognition

The aim of this task is to recognise cropped text block images into character sequences. The cropped boxes are annotated text block boxes padded by 3 pixels on all sides. Only legible English text blocks and numbers will be considered in this task.

The recognition will be unconstrained, meaning that there will be no lexicon provided.

Ground Truth Format

For the text block recognition task, we provide all the text blocks in our dataset with 3 characters or more in separate image files, along with the corresponding ground-truth transcription. For each text block the axis oriented area that tighly contains the text block is provided.

The transcription of all text blocks is provided in a SINGLE UTF-8 text file for the whole collection. Each line in the ground truth file has the following format

[word image name], transcription

Note that the transcription is anything that follows the 1st comma until the end of line. No escape characters are to be used.

In addition, the relative coordinates of the (non-axis oriented) bounding box that defines the text block within the cut-out text block image will be provided in a separate SINGLE text file for the whole collection. The coordinates of the text blocks are given in reference to the cut-out box, as the four corners of the bounding box in a clock-wise manner. Each line in the ground truth file has the following format.

[word image name], x1, y1, x2, y2, x3, y3, x4, y4

Results Format

For testing we will provide the cropped images of text blocks and we will ask for the transcription of each image. A single transcription per image will be requested. The authors should return all transcriptions in a single UTF-8 encoded text file, in the same format as the ground truth:

[word image name], transcription

Note that the transcription is anything that follows the 1st comma until the end of line. No escape characters are to be used.

Task 3: End-to-End Recognition

The aim of this task is to both localise and recognise text blocks in images. Only legible English text blocks that comprise three or more characters are considered. The rest are treated as “don’t care” objects.

Ground Truth Format

The ground truth is provided in the same format as in Task 1.

Results Format

End-to-end results should be provided in a similar format as in Task 1. A single zip file with results named named after test image IDs following the following convention is expected:

res_[image name].txt 

Inside each text file, a list of detected bounding boxes, along with the confidence of the detection and the transcription should be provided:

x1, y1, x2, y2, x3, y3, x4, y4, confidence, transcription

Note that the transcription is anything that follows the 8th comma until the end of line. No escape characters are to be used.

Evaluation Metrics

All evaluation metrics (Task1, 2 and 3) of ICDAR 2017 DeTEXT Competition are similar to ICDAR2017 COCO-Text Challenge with the only difference that 4-point boxes are defined in this case (instead of 2-point, axis-oriented boxes in COCO-Text Challenge).

References

[1] Xu-Cheng Yin, Chun Yang, Wei-Yi Pei, Haixia Man, Jun Zhang, Erik Learned-Miller, and Hong Yu, "DeTEXT: A database for evaluating text extraction from biomedical literature figures," PLoS ONE, vol. 10, no. 5, pp. e0126200, 2015. (Paper Link)

Challenge News

Important Dates

  • April, 1: Datasets available.

  • June, 10:  Testing set release.

  • June, 30: Submission of results deadline.

  • November, 10-15: Results presentation.