Results - Hierarchical Text: Challenge on Unified OCR and Layout Analysis

method: Upstage KR2023-04-01

Authors: Dahyun Kim, Yunsu Kim, Seung Shin, Bibek Chaudhary, Sanghoon Kim, Sehwan Joo

Affiliation: Upstage

Description: For Task 2, we use a cascade approach where the pipeline is broken up into 1) text detection and 2) text recognition. For text detection, we use the Task 1 methodology. For text recognition, we use the ParSeq [1] architecture with the visual feature extractor changed to SwinV2 [2].
We pretrain the text recognizer with synthetic data before we fine-tune it on the HierText dataset. We use an in-house synthetic data generator derived from the open source SynthTiger [3] to generate word images using English and Korean corpus. We generate 10M English/Korean word images with horizontal layout and 5M English/Korean word images with vertical layout. For the final submission, we use an ensemble of three text recognizers for strong and stable performance.

[1] Bautista, D., & Atienza, R. (2022, October). Scene text recognition with permuted autoregressive sequence models. In ECCV 2022
[2] Liu, Z., Hu, H., Lin, Y., Yao, Z., Xie, Z., Wei, Y., ... & Guo, B. (2022). Swin transformer v2: Scaling up capacity and resolution. In CVPR
[3] Yim, M., Kim, Y., Cho, H.C. and Park, S., 2021. SynthTIGER: synthetic text image GEneratoR towards better text recognition models. In ICDAR 2021

method: Upstage KR2023-03-30

Authors: Dahyun Kim, Yunsu Kim, Seung Shin, Bibek Chaudhary, Sanghoon Kim, Sehwan Joo

Affiliation: Upstage

method: Upstage KR2023-03-31

Authors: Dahyun Kim, Yunsu Kim, Seung Shin, Bibek Chaudhary, Sanghoon Kim, Sehwan Joo

Affiliation: Upstage

Ranking Table

Description Paper Source Code

		Word
Date	Method	PQ	Fscore	Precision	Recall	Tightness
2023-04-01	Upstage KR	0.7000	0.7958	0.8205	0.7725	0.8797
2023-03-30	Upstage KR	0.6961	0.7888	0.8197	0.7602	0.8825
2023-03-31	Upstage KR	0.6961	0.7888	0.8197	0.7602	0.8825
2023-04-02	DeepSE End-to-End Text Detection and Recognition Model	0.6746	0.7793	0.8805	0.6989	0.8657
2023-04-02	DeepSE End-to-End Text Detection and Recognition Model	0.6746	0.7793	0.8805	0.6989	0.8657
2023-03-24	NVTextSpotter	0.6357	0.7410	0.8094	0.6834	0.8578
2023-03-17	NVTextSpotter	0.6187	0.7232	0.8054	0.6562	0.8555
2023-04-01	Clova DEER	0.6070	0.7695	0.7791	0.7602	0.7889
2023-04-02	Ensemble of three task-specific Clova DEER	0.5984	0.7615	0.7763	0.7473	0.7859
2023-03-29	SCUT-HUAWEI	0.5812	0.7341	0.7438	0.7246	0.7917
2023-03-30	DBNet++ and SATRN	0.5162	0.7164	0.8276	0.6315	0.7206
2023-04-01	keba	0.4535	0.5415	0.6764	0.4515	0.8375
2023-05-15	nn	0.4292	0.6068	0.6957	0.5381	0.7072
2023-05-12	adaptive_clustering	0.3918	0.5370	0.6867	0.4409	0.7295
2023-05-12	fixed_clustering	0.3918	0.5370	0.6867	0.4409	0.7295

Inactive evaluations

method: Upstage KR2023-04-01

method: Upstage KR2023-03-30

method: Upstage KR2023-03-31

Ranking Table

Ranking Graphic