Overview - Hierarchical Text: Challenge on Unified OCR and Layout Analysis

a.k.a.

ICDAR 2023 Competition on Hierarchical Text Detection and Recognition

Introduction

Historically, Optical Character Recognition (OCR) and layout analysis have been treated as separate tasks. OCR tasks consist of the detection and recognition of individual words or text lines, with no attention to the understanding of the layout. On the other hand, layout analysis tasks only focus on the overall text structure mainly on document images, assuming the presence of OCR results by off-the-shelf methods and ignoring the structure of text in natural images. We argue that OCR and layout analysis are equally indispensable and mutually complementary for computers to understand text in images. We also ask a futuristic question: Will the consolidation of the two tasks into one single system benefit the accuracy (and latency) of both

To narrow the discrepancy between the two research fields and facilitate future efforts into this direction, we collect and annotate a dataset, called Hierarchical Text (HierText) [1]. The images are sampled from the Open Images dataset [2] and thus cover a wide variety of domains. The dataset features a hierarchical annotation of 3 levels: word, line, and paragraph. It is also characterized by high word density (>100 words per image).

dataset.png

Fig 1. Dataset Overview.

overview1.png

Fig 2. The notion of hierarchical text representation.

Given the untouched challenges and promising potential benefits, we propose the ICDAR 2023 Competition on Hierarchical Text Detection and Recognition. See the Tasks tab for competition track introductions. Please also visit our GitHub repository for the definition of the datasets, annotation formats, download links, submission formats, and offline evaluation script.

To evaluate the methods on test set, researchers can upload the inference results on the test set and upload to this site via the "My Methods" Tab. For evaluation on the validation set, researchers can use the validation ground-truths and the offline evaluation scripts in the Github repository. During the competition period, results of test set will be temporarily hidden until the end of the competition.

Note: if you do not wish to participate in the competition but still want to evaluate your methods on HierText (e.g. in your research paper), you can email us requesting it. You will first need to submit your inference results via this website, and send us an email with your real names using your institutional email (e.g. edu, corp email). After verification, we will then send the evaluation results back to you.

FAQ:

Q1: Can I use other public datasets?

A: No and Yes. HierText is the only allowed annotated OCR dataset. However, feel free to do self-labeling on other public datasets as long as you don't use their labels. In other words, you can use the images of other public datasets, but not their labels. You're also encouraged to submit both results, one with self-labeled data, one without. In this way, we can have a better understanding of its effect.

Besides, you can also use annotated image classification / object detection / segmentation datasets, e.g. for the purpose of pretraining, as long as the annotation is not for OCR. 

Q2: Can I use synthetic datasets?

A: Feel free to use any synthetic datasets, whether they are public or private. In this case, please indicate how they are synthesized in your method introduction (for report purpose).

Q3: Can I use the val split in training model?

A: No. You are not supposed to use the val split in training models.

Q4: Can I make multiple submissions?

A: Yes, the system allows you to keep multiple submissions. However, we'll use the latest as your final score.

Reference

  1. Long, Shangbang, Siyang Qin, Dmitry Panteleev, Alessandro Bissacco, Yasuhisa Fujii, and Michalis Raptis. "Towards End-to-End Unified Scene Text Detection and Layout Analysis." In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 1049-1059. 2022.
  2. Kuznetsova, Alina, Hassan Rom, Neil Alldrin, Jasper Uijlings, Ivan Krasin, Jordi Pont-Tuset, Shahab Kamali et al. "The open images dataset v4." International Journal of Computer Vision 128, no. 7 (2020): 1956-1981.

 

Important Dates

All dates are final.

- 2023 Jan 2nd: Start of the competition, and submissions of results will be made available.

- 2023 Apr 1st 23:59 PST: Deadline for submissions to the ICDAR 2023 Competition 

- 2023 Apr 15th: Release of competition results.