Results - Hierarchical Text: Challenge on Unified OCR and Layout Analysis

method: Global and local instance segmentations for hierarchical text detection2023-04-01

Authors: Xingran Zhao, Jing Xian, Yadong Li, Hongbin Wang

Affiliation: AntGroup

Email: zhaoxingran.zxr@antgroup.com;xianjing.xj@antgroup;liyadong.lyd@antgroup.com;hongbin.whb@antgroup.com

Description: For word and line detection, we firstly crop patches from images for catching local mask results. Second, we also get global mask results by using full images as the input. Thirdly, we merge global and local results by using NMS postprocess procedure. For paragraph detection, we only use full images as input and get global mask results. All detectors are CBNetV2[1] with HTC[2]. For hierarchical text detection, we use IOS(intersection-of-sets) as metric to assign words into lines and use same strategy to assign lines into paragraphs.
[1]CBNetV2: A Composite Backbone Network Architecture for Object Detection.
[2]Hybrid Task Cascade for Instance Segmentation.

CBNetV2: A Composite Backbone Network Architecture for Object Detection.

Hybrid Task Cascade for Instance Segmentation.

Source code

method: Hi-SAM2023-12-28

Authors: Maoyuan Ye, Jing Zhang, Juhua Liu, Chenyu Liu, Baocai Yin, Cong Liu, Bo Du, Dacheng Tao

Description: A unified text segmentation model across four hierarchies, including stroke, word, text-line, and paragraph, while realizing layout analysis as well. Only the training data of HierText is adopted.

@article{ye2024hi-sam, title={Hi-SAM: Marrying Segment Anything Model for Hierarchical Text Segmentation}, author={Ye, Maoyuan and Zhang, Jing and Liu, Juhua and Liu, Chenyu and Yin, Baocai and Liu, Cong and Du, Bo and Tao, Dacheng}, journal={arXiv preprint arXiv:2401.17904}, year={2024} }

Source code

method: Unified Detector (CVPR 2022 version)2022-08-09

Authors: Shangbang Long, Siyang Qin, Dmitry Panteleev, Alessandro Bissacco, Yasuhisa Fujii, Michalis Raptis

Affiliation: Google Research

Description: This official submission accompanies our paper: Towards End-to-End Unified Scene Text Detection and Layout Analysis. Note that the unified detector model produces line-level masks and a line-wise affinity matrix that groups lines into paragraphs. It is unable to produce word-level detection directly. For evaluation purpose solely, we use heuristics to extract word masks from line masks. Please refer to the source code (https://github.com/tensorflow/models/tree/master/official/projects/unified_detector#demo-on-single-images) to learn how this is performed.

Long, S., Qin, S., Panteleev, D., Bissacco, A., Fujii, Y., & Raptis, M. (2022). Towards End-to-End Unified Scene Text Detection and Layout Analysis. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (pp. 1049-1059).

Source code

Ranking Table

Description Paper Source Code

		Word					Line					Paragraph
Date	Method	PQ	Fscore	Precision	Recall	Tightness	PQ	Fscore	Precision	Recall	Tightness	PQ	Fscore	Precision	Recall	Tightness
2023-04-01	Global and local instance segmentations for hierarchical text detection	0.7616	0.9072	0.9345	0.8816	0.8395	0.6850	0.8222	0.8024	0.8431	0.8331	0.6255	0.7511	0.7400	0.7625	0.8328
2023-12-28	Hi-SAM	0.6430	0.8286	0.8766	0.7856	0.7760	0.6696	0.8530	0.9109	0.8020	0.7850	0.5909	0.7597	0.8152	0.7113	0.7779
2022-08-09	Unified Detector (CVPR 2022 version)	0.4821	0.6151	0.6754	0.5647	0.7838	0.6223	0.7991	0.7964	0.8019	0.7787	0.5360	0.6858	0.7604	0.6245	0.7817
2023-02-06	HierText official ckpt	0.4799	0.6135	0.6719	0.5645	0.7822	0.6220	0.7998	0.8000	0.7996	0.7777	0.5351	0.6856	0.7654	0.6208	0.7805

Inactive evaluations

method: Global and local instance segmentations for hierarchical text detection2023-04-01

method: Hi-SAM2023-12-28

method: Unified Detector (CVPR 2022 version)2022-08-09

Ranking Table

Ranking Graphic

Ranking Graphic - Line PQ

Ranking Graphic - Paragraph PQ