method: Upstage KR2023-04-01

Authors: Yunsu Kim, Seung Shin, Bibek Chaudhary, Sanghoon Kim, Dahyun Kim, Sehwan Joo

Affiliation: Upstage

Description: In addressing hierarchical text detection, we implement a two-step approach. First, we perform multi-class semantic segmentation where classes are word, line, and paragraph regions. Then, we use the predicted probability map to extract and organize these entities hierarchically. Specifically, we utilize ensemble of UNets with ImageNet-pretrained EfficientNetB7/MitB4 backbones to extract class masks. Connected components are identified in the predicted mask to separate words from each other, same for lines and paragraphs. Then, word_i is assigned as a child of line_j if line_j has the highest IoU with word_i compared to all other lines. This process is similarly applied to lines and paragraphs.
For training, we erode target entities and dillate predicted entities. Also we ensure that target entities maintain a gap between them. We use symmetric Lovasz loss. We use SynthText dataset to pretrain our models.

method: Upstage KR2023-03-30

Authors: Yunsu Kim, Seung Shin, Bibek Chaudhary, Sanghoon Kim, Dahyun Kim, Sehwan Joo

Affiliation: Upstage

Description: In addressing hierarchical text detection, we implement a two-step approach. First, we perform multi-class semantic segmentation where classes are word, line, and paragraph regions. Then, we use the predicted probability map to extract and organize these entities hierarchically. Specifically, we utilize ensemble of UNets with ImageNet-pretrained EfficientNetB7/MitB4 backbones to extract class masks. Connected components are identified in the predicted mask to separate words from each other, same for lines and paragraphs. Then, word_i is assigned as a child of line_j if line_j has the highest IoU with word_i compared to all other lines. This process is similarly applied to lines and paragraphs.
For training, we erode target entities and dillate predicted entities. Also we ensure that target entities maintain a gap between them. We use symmetric Lovasz loss. We use SynthText dataset to pretrain our models.

Authors: Zhong Humen, Tang Jun, Yang zhibo, Song xiaoge

Affiliation: Alibaba DAMO OCR Team

Email: zhonghumen@gmail.com

Description: Our method is a single end-to-end model designed for hierarchical text detection. Our model utilizes the pipeline of DETR-like methods and design a hierarchical decoder so that the model can detect more text instances with less queries for reducing computational cost.
The model uses ImageNet pretrained Swin-S as backbone and is trained only on HierText training set. Single-scale inference is used during testing. No external data and synthetic data is used.

Ranking Table

Description Paper Source Code
WordLineParagraph
DateMethodPQFscorePrecisionRecallTightnessPQFscorePrecisionRecallTightnessPQFscorePrecisionRecallTightness
2023-04-01Upstage KR0.79800.91880.94730.89200.86850.76400.88340.91320.85560.86480.74540.86150.87400.84940.8652
2023-03-30Upstage KR0.79480.91380.94960.88070.86970.76570.87970.90890.85230.87040.74790.85910.87110.84740.8705
2023-04-01hiertext_submit_0401_curve_199_v20.76710.88180.92710.84080.86990.71430.83320.89320.78070.85730.63970.74830.81250.69350.8548
2023-04-01Global and local instance segmentations for hierarchical text detection0.76160.90720.93450.88160.83950.68500.82220.80240.84310.83310.62550.75110.74000.76250.8328
2023-04-02DeepSE hierarchical detection model0.75300.88490.93500.83990.85100.69430.82430.82650.82210.84230.68510.81390.81690.81100.8417
2023-03-31Multi Class Deformable Detr for Hierarchal Text Detection0.73200.88890.90650.87200.82350.69010.84130.84830.83450.82020.63800.78070.77070.79090.8173
2023-04-01Clova DEER0.71750.91950.93090.90830.78030.69850.89000.91260.86860.78480.65310.83500.83780.83220.7822
2023-04-02Ensemble of three task-specific Clova DEER0.71540.92030.93820.90310.77740.69640.89040.91750.86490.78210.65290.83700.84170.83230.7801
2023-03-31Hierarchical Transformers for Text Detection0.70440.86090.88470.83830.81820.69300.85230.87830.82780.81310.63460.78400.77840.78970.8094
2023-12-28Hi-SAM0.64300.82860.87660.78560.77600.66960.85300.91090.80200.78500.59090.75970.81520.71130.7779
2022-08-09Unified Detector (CVPR 2022 version)0.48210.61510.67540.56470.78380.62230.79910.79640.80190.77870.53600.68580.76040.62450.7817
2023-02-06HierText official ckpt0.47990.61350.67190.56450.78220.62200.79980.80000.79960.77770.53510.68560.76540.62080.7805

Ranking Graphic

Ranking Graphic - Line PQ

Ranking Graphic - Paragraph PQ