method: Global and local instance segmentations for hierarchical text detection2023-04-01
Authors: Xingran Zhao, Jing Xian, Yadong Li, Hongbin Wang
Affiliation: AntGroup
Email: zhaoxingran.zxr@antgroup.com;xianjing.xj@antgroup;liyadong.lyd@antgroup.com;hongbin.whb@antgroup.com
Description: For word and line detection, we firstly crop patches from images for catching local mask results. Second, we also get global mask results by using full images as the input. Thirdly, we merge global and local results by using NMS postprocess procedure. For paragraph detection, we only use full images as input and get global mask results. All detectors are CBNetV2[1] with HTC[2]. For hierarchical text detection, we use IOS(intersection-of-sets) as metric to assign words into lines and use same strategy to assign lines into paragraphs.
[1]CBNetV2: A Composite Backbone Network Architecture for Object Detection.
[2]Hybrid Task Cascade for Instance Segmentation.
method: Clova DEER2023-04-01
Authors: Song Kayeon, Taeho Kil, Donghyun Kim, Sukmin Seo
Affiliation: Naver Cloud
Description: Our model passes through a CNN and deformable transformer encoder to extract multi-scale visual features for images. Then, an independent segmentation head is utilized to extract words, lines, and paragraphs. Additionally, text recognition results are achieved through a deformable transformer decoder. Our model performs both layout detection and OCR simultaneously. In summary, our single model performs both layout detection (task 1) and OCR (task 2) simultaneously.
method: Hi-SAM2023-12-28
Authors: Maoyuan Ye, Jing Zhang, Juhua Liu, Chenyu Liu, Baocai Yin, Cong Liu, Bo Du, Dacheng Tao
Description: A unified text segmentation model across four hierarchies, including stroke, word, text-line, and paragraph, while realizing layout analysis as well. Only the training data of HierText is adopted.
Word | Line | Paragraph | |||||||||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
Date | Method | PQ | Fscore | Precision | Recall | Tightness | PQ | Fscore | Precision | Recall | Tightness | PQ | Fscore | Precision | Recall | Tightness | |||
2023-04-01 | Global and local instance segmentations for hierarchical text detection | 0.7616 | 0.9072 | 0.9345 | 0.8816 | 0.8395 | 0.6850 | 0.8222 | 0.8024 | 0.8431 | 0.8331 | 0.6255 | 0.7511 | 0.7400 | 0.7625 | 0.8328 | |||
2023-04-01 | Clova DEER | 0.7175 | 0.9195 | 0.9309 | 0.9083 | 0.7803 | 0.6985 | 0.8900 | 0.9126 | 0.8686 | 0.7848 | 0.6531 | 0.8350 | 0.8378 | 0.8322 | 0.7822 | |||
2023-12-28 | Hi-SAM | 0.6430 | 0.8286 | 0.8766 | 0.7856 | 0.7760 | 0.6696 | 0.8530 | 0.9109 | 0.8020 | 0.7850 | 0.5909 | 0.7597 | 0.8152 | 0.7113 | 0.7779 | |||
2022-08-09 | Unified Detector (CVPR 2022 version) | 0.4821 | 0.6151 | 0.6754 | 0.5647 | 0.7838 | 0.6223 | 0.7991 | 0.7964 | 0.8019 | 0.7787 | 0.5360 | 0.6858 | 0.7604 | 0.6245 | 0.7817 | |||
2023-02-06 | HierText official ckpt | 0.4799 | 0.6135 | 0.6719 | 0.5645 | 0.7822 | 0.6220 | 0.7998 | 0.8000 | 0.7996 | 0.7777 | 0.5351 | 0.6856 | 0.7654 | 0.6208 | 0.7805 |