Tasks - ICDAR 2023 Competition on Reading the Seal Title

Our proposed competition consists of two main tasks:

Seal title text detection
End-to-end seal title recognition

Dataset and Annotations

We name our dataset ReST, as it focuses on Reading Seal Title text. It totally includes 10,000 images collected from real scene, as shown in Figure 1. The data is mainly in Chinese, with English data accounting for 1%.

图片7.png

Figure 1. Example images of Real Seal data.

The datasets cover the most common classes of seals:

Circle/Ellipse shapes: This type of seals are commonly existing in official seals, invoice seals, contract seals, and bank seals.

Rectangle shapes: This type of seals are commonly seen in driving licenses, corporate seals, and medical bills.

Triangle shapes: This type of seals are seen in bank receipts and financial occasions. This type is uncommon seal and has a small amount of data.

The dataset is split half into a training set and a test set. Every image in the dataset is annotated with text line locations and the labels. Locations are annotated in terms of polygons, which are in clockwise order. Transcripts are UTF-8 encoded strings. Annotations for an image are stored in a json file with the identical file name, following the naming convention: gt_[image_id], where image_id refers to the index of the image in the dataset.

In the JSON file, each gt_[image_id] corresponds to a list, where each line in the list correspond to one text instance in the image and gives its bounding box coordinates and transcription, in the following format:

{

“gt_1”: [

{“points”: [[x1, y1], [x2, y2], …, [xn, yn]], “transcription” : “trans1” }],

“gt_2”: [

{“points”: [[x1, y1], [x2, y2], …, [xn, yn]] , “transcription” : “trans3” }],

……

}

where x1, y1, x2, y2, …, xn, yn in “points” are the coordinates of the polygon bounding boxes,. The “transcription” denotes the text of each text line.

Note: There may be some inaccurate annotations in the training set, which can measure the robustness of the algorithm, and participants may filter this part of the data as appropriate. The test set is manually corrected and the annotations are accurate.

Task 1. Seal Title Text Detection

The aim of this task is to localize the title text in seal image, The input examples are shown in Figure 2.

图片3.png

Figure 2. Example images of the Seal Text dataset. Green color binding lines are formed with polygon ground truth format.

Submission Format

Participants will be asked to submit a JSON file containing results for all test images. The results format is:

{

“res _1”: [

{“points”: [[x1, y1], [x2, y2], …, [xn, yn]], “confidence” : c}],

“res _2”: [

{“points”: [[x1, y1], [x2, y2], …, [xn, yn]] , “confidence” : c }],

……

}

where the key of JSON file should adhere to the format of res_[image_id]. Also, n is the total number of vertices (could be unfixed, varied among different predicted text instance), and c is the confidence score of the prediction and the range is 0-1.

Evaluation Protocol

For Task1, we adopt IoU-based evaluation protocol by following CTW1500 [1] . IoU is a threshold-based evaluation protocol, with 0.5 set as the default threshold. We will report results on 0.5 and 0.7 thresholds but only H-Mean under 0.7 will be treated as the final score for each submitted model, and to be used as submission ranking purpose. To ensure fairness, the competitors are required to submit confidence score for each detection, and thus we can iterate all confidence thresholds to find the best H-Mean score. Meanwhile, in the case of multiple matches, we only consider the detection region with the highest IOU, the rest of the matches will be counted as False Positive. The calculation of Precision, Recall, and F-score are as follows:

图片4.png

where TP, FP, FN and F denote true positive, false positive, false negative and H-Mean, respectively.

Task 2. End-to-end Seal Title Recognition

The main objective of this task is to extract the title of a seal, as shown in Figure 3, the input is a whole seal image and the output is the seal’s title.

图片5.png

Figure 3. Example of the task2 input-output.

Submission Format

For Task 2, participants are required to submit the predicted titles for all the images in a single JSON file:

{

“res_1”: [{ “transcription” : “title1”}],

“res_2”: [{ “transcription” : “title2”}],

“res_3”: [{ “transcription” : “title3”}],

……

}

where the key of JSON file should adhere to the format of res_[image_id].

Evaluation Protocol

Metrics for this task is case-insensitive word accuracy. We will compute the ratio of correctly predicted titles and the total titles.

References

[1] Yuliang, Liu, Lianwen, Jin, et al. "Curved Scene Text Detection via Transverse and Longitudinal Sequence Connection." Pattern Recognition, 2019.

[2] Chng, Chee-Kheng et al. “ICDAR2019 Robust Reading Challenge on Arbitrary-Shaped Text - RRC-ArT.” 2019 International Conference on Document Analysis and Recognition (ICDAR), 2019.

Challenge News

03/05/2023
ReST: Task1&2 Test set images available
01/11/2023
ReST: The Vocabulary Updated
12/31/2022
ReST: Training set available

Important Dates

Note: The time zone of all deadlines is UTC-12. The cut-off time for all dates is 11:59 PM.

December 23-30, 2022: Website ready

December 30, 2022: Training set images and ground truth available

March 5, 2023: Task 1&2 test set images available

March 6, 2023: Task 1&2 submission open

March 20, 2023: Task 1&2 submission deadline

Note on the registration for the ReST challenge:

There is no need to register explicitly for the ReST challenge. As long as you are registered to the RRC portal you will be able to submit your results when the submission is open.