method: Upstage KR2023-04-01
Authors: Yunsu Kim, Seung Shin, Bibek Chaudhary, Sanghoon Kim, Dahyun Kim, Sehwan Joo
Affiliation: Upstage
Description: In addressing hierarchical text detection, we implement a two-step approach. First, we perform multi-class semantic segmentation where classes are word, line, and paragraph regions. Then, we use the predicted probability map to extract and organize these entities hierarchically. Specifically, we utilize ensemble of UNets with ImageNet-pretrained EfficientNetB7/MitB4 backbones to extract class masks. Connected components are identified in the predicted mask to separate words from each other, same for lines and paragraphs. Then, word_i is assigned as a child of line_j if line_j has the highest IoU with word_i compared to all other lines. This process is similarly applied to lines and paragraphs.
For training, we erode target entities and dillate predicted entities. Also we ensure that target entities maintain a gap between them. We use symmetric Lovasz loss. We use SynthText dataset to pretrain our models.
method: Upstage KR2023-03-30
Authors: Yunsu Kim, Seung Shin, Bibek Chaudhary, Sanghoon Kim, Dahyun Kim, Sehwan Joo
Affiliation: Upstage
Description: In addressing hierarchical text detection, we implement a two-step approach. First, we perform multi-class semantic segmentation where classes are word, line, and paragraph regions. Then, we use the predicted probability map to extract and organize these entities hierarchically. Specifically, we utilize ensemble of UNets with ImageNet-pretrained EfficientNetB7/MitB4 backbones to extract class masks. Connected components are identified in the predicted mask to separate words from each other, same for lines and paragraphs. Then, word_i is assigned as a child of line_j if line_j has the highest IoU with word_i compared to all other lines. This process is similarly applied to lines and paragraphs.
For training, we erode target entities and dillate predicted entities. Also we ensure that target entities maintain a gap between them. We use symmetric Lovasz loss. We use SynthText dataset to pretrain our models.
method: hiertext_submit_0401_curve_199_v22023-04-01
Authors: Zhong Humen, Tang Jun, Yang zhibo, Song xiaoge
Affiliation: Alibaba DAMO OCR Team
Email: zhonghumen@gmail.com
Description: Our method is a single end-to-end model designed for hierarchical text detection. Our model utilizes the pipeline of DETR-like methods and design a hierarchical decoder so that the model can detect more text instances with less queries for reducing computational cost.
The model uses ImageNet pretrained Swin-S as backbone and is trained only on HierText training set. Single-scale inference is used during testing. No external data and synthetic data is used.
Word | Line | Paragraph | |||||||||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
Date | Method | PQ | Fscore | Precision | Recall | Tightness | PQ | Fscore | Precision | Recall | Tightness | PQ | Fscore | Precision | Recall | Tightness | |||
2023-04-01 | Upstage KR | 0.7980 | 0.9188 | 0.9473 | 0.8920 | 0.8685 | 0.7640 | 0.8834 | 0.9132 | 0.8556 | 0.8648 | 0.7454 | 0.8615 | 0.8740 | 0.8494 | 0.8652 | |||
2023-03-30 | Upstage KR | 0.7948 | 0.9138 | 0.9496 | 0.8807 | 0.8697 | 0.7657 | 0.8797 | 0.9089 | 0.8523 | 0.8704 | 0.7479 | 0.8591 | 0.8711 | 0.8474 | 0.8705 | |||
2023-04-01 | hiertext_submit_0401_curve_199_v2 | 0.7671 | 0.8818 | 0.9271 | 0.8408 | 0.8699 | 0.7143 | 0.8332 | 0.8932 | 0.7807 | 0.8573 | 0.6397 | 0.7483 | 0.8125 | 0.6935 | 0.8548 | |||
2023-04-01 | Global and local instance segmentations for hierarchical text detection | 0.7616 | 0.9072 | 0.9345 | 0.8816 | 0.8395 | 0.6850 | 0.8222 | 0.8024 | 0.8431 | 0.8331 | 0.6255 | 0.7511 | 0.7400 | 0.7625 | 0.8328 | |||
2023-04-02 | DeepSE hierarchical detection model | 0.7530 | 0.8849 | 0.9350 | 0.8399 | 0.8510 | 0.6943 | 0.8243 | 0.8265 | 0.8221 | 0.8423 | 0.6851 | 0.8139 | 0.8169 | 0.8110 | 0.8417 | |||
2023-03-31 | Multi Class Deformable Detr for Hierarchal Text Detection | 0.7320 | 0.8889 | 0.9065 | 0.8720 | 0.8235 | 0.6901 | 0.8413 | 0.8483 | 0.8345 | 0.8202 | 0.6380 | 0.7807 | 0.7707 | 0.7909 | 0.8173 | |||
2023-04-01 | Clova DEER | 0.7175 | 0.9195 | 0.9309 | 0.9083 | 0.7803 | 0.6985 | 0.8900 | 0.9126 | 0.8686 | 0.7848 | 0.6531 | 0.8350 | 0.8378 | 0.8322 | 0.7822 | |||
2023-04-02 | Ensemble of three task-specific Clova DEER | 0.7154 | 0.9203 | 0.9382 | 0.9031 | 0.7774 | 0.6964 | 0.8904 | 0.9175 | 0.8649 | 0.7821 | 0.6529 | 0.8370 | 0.8417 | 0.8323 | 0.7801 | |||
2023-03-31 | Hierarchical Transformers for Text Detection | 0.7044 | 0.8609 | 0.8847 | 0.8383 | 0.8182 | 0.6930 | 0.8523 | 0.8783 | 0.8278 | 0.8131 | 0.6346 | 0.7840 | 0.7784 | 0.7897 | 0.8094 | |||
2023-12-28 | Hi-SAM | 0.6430 | 0.8286 | 0.8766 | 0.7856 | 0.7760 | 0.6696 | 0.8530 | 0.9109 | 0.8020 | 0.7850 | 0.5909 | 0.7597 | 0.8152 | 0.7113 | 0.7779 | |||
2022-08-09 | Unified Detector (CVPR 2022 version) | 0.4821 | 0.6151 | 0.6754 | 0.5647 | 0.7838 | 0.6223 | 0.7991 | 0.7964 | 0.8019 | 0.7787 | 0.5360 | 0.6858 | 0.7604 | 0.6245 | 0.7817 | |||
2023-02-06 | HierText official ckpt | 0.4799 | 0.6135 | 0.6719 | 0.5645 | 0.7822 | 0.6220 | 0.7998 | 0.8000 | 0.7996 | 0.7777 | 0.5351 | 0.6856 | 0.7654 | 0.6208 | 0.7805 |