Overview - HierText

Hierarchical Text: Challenge on Unified OCR and Layout Analysis

Historically, Optical Character Recognition (OCR) and layout analysis have been treated as separate tasks. OCR tasks consist of the detection and recognition of individual words or text lines, with no attention to the understanding of the layout. On the other hand, layout analysis tasks only focus on the overall text structure mainly on document images, assuming the presence of OCR results by off-the-shelf methods and ignoring the structure of text in natural images. We argue that OCR and layout analysis are equally indispensable and mutually complementary for computers to understand text in images. We also ask a futuristic question: Will the consolidation of the two tasks into one single system benefit the accuracy (and latency) of both

To narrow the discrepancy between the two research fields and facilitate future efforts into this direction, we collect and annotate a dataset, called Hierarchical Text (HierText) [1]. The images are sampled from the Open Images dataset [2] and thus cover a wide variety of domains. The dataset features a hierarchical annotation of 3 levels: word, line, and paragraph. It is also characterized by high word density (>100 words per image).


Fig 1. Dataset Overview.

This site will mainly be used for the evaluation on the test set. Please visit our Github repository for the definition of the tasks, annotation formats, download links, submission formats, and offline evaluation script. To evaluate the methods on test set, researchers can upload the inference results on the test set and upload to this site via the "My Methods" Tab.



  1. Long, Shangbang, Siyang Qin, Dmitry Panteleev, Alessandro Bissacco, Yasuhisa Fujii, and Michalis Raptis. "Towards End-to-End Unified Scene Text Detection and Layout Analysis." In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 1049-1059. 2022.
  2. Kuznetsova, Alina, Hassan Rom, Neil Alldrin, Jasper Uijlings, Ivan Krasin, Jordi Pont-Tuset, Shahab Kamali et al. "The open images dataset v4." International Journal of Computer Vision 128, no. 7 (2020): 1956-1981.


Important Dates