- Task 1 - Text Localization
- Task 2 - Script identification
- Task 3 - Joint text detection and script identification
method: TH2020-04-16
Authors: Tsinghua University and Hyundai Motor Group AIRS Company
Email: Shanyu Xiao: xiaosy19@mails.tsinghua.edu.cn
Description: We have built an end-to-end scene text spotter based on Mask R-CNN & Transformer. The ResNeXt-101 backbone and multiscale training/testing are used.
method: Sogou_OCR2019-11-08
Authors: Xudong Rao, Lulu Xu, Long Ma, Xuefeng Su
Description: An arbitrary-shaped text detection method based on Mask R-CNN, we use resnext-152 as our backbone, multi-scale training and testing are adopted to get the final results.
method: AntAI-Cognition2020-04-22
Authors: Qingpei Guo, Yudong Liu, Pengcheng Yang, Yonggang Li, Yongtao Wang, Jingdong Chen, Wei Chu
Affiliation: Ant Group & PKU
Email: qingpei.gqp@antgroup.com
Description: We are from Ant Group & PKU. Our approach is an ensemble method with three text detection models. The text detection models mainly follow the MaskRCNN framework[1], with different backbones(ResNext101-64x4d[2], CBNet[3], ResNext101-32x32d_wsl[4]) used. GBDT[5] is trained to normalize confidence scores and select quadrilateral boxes with the highest quality from all text detection models' outputs. Multi-scale training and testing are adopted for all basic models. For the training set, we also add ICDAR19 MLT datasets, both training & validation sets are used to get the final result.
[1] He K, Gkioxari G, Dollár P, et al. Mask r-cnn[C]//Proceedings of the IEEE international conference on computer vision. 2017: 2961-2969. [2] Xie S, Girshick R, Dollár P, et al. Aggregated residual transformations for deep neural networks[C]//Proceedings of the IEEE conference on computer vision and pattern recognition. 2017: 1492-1500. [3] Liu Y, Wang Y, Wang S, et al. Cbnet: A novel composite backbone network architecture for object detection[J]. arXiv preprint arXiv:1909.03625, 2019. [4] Mahajan D, Girshick R, Ramanathan V, et al. Exploring the limits of weakly supervised pretraining[C]//Proceedings of the European Conference on Computer Vision (ECCV). 2018: 181-196. [5] Ke G, Meng Q, Finley T, et al. Lightgbm: A highly efficient gradient boosting decision tree[C]//Advances in neural information processing systems. 2017: 3146-3154.
Date | Method | Hmean | Precision | Recall | Average Precision | |||
---|---|---|---|---|---|---|---|---|
2020-04-16 | TH | 59.17% | 44.09% | 89.91% | 79.49% | |||
2019-11-08 | Sogou_OCR | 57.05% | 42.33% | 87.46% | 68.55% | |||
2020-04-22 | AntAI-Cognition | 56.68% | 41.26% | 90.51% | 74.90% | |||
2024-04-02 | FPDIoU | 56.62% | 46.96% | 71.28% | 32.79% | |||
2021-03-21 | OSKDet | 56.12% | 42.00% | 84.54% | 70.22% | |||
2023-05-22 | DeepSolo++ (ResNet-50) | 55.30% | 41.29% | 83.70% | 74.98% | |||
2019-03-29 | GNNets (single scale) | 52.80% | 39.41% | 79.97% | 47.89% | |||
2018-11-20 | Pixel-Anchor | 52.70% | 39.01% | 81.16% | 42.19% | |||
2019-08-08 | JDAI | 52.31% | 37.11% | 88.60% | 73.66% | |||
2019-03-19 | ccnet single scale | 52.16% | 37.49% | 85.71% | 68.08% | |||
2024-03-14 | gts | 52.12% | 39.47% | 76.72% | 64.64% | |||
2020-09-28 | DCLNet | 51.90% | 37.60% | 83.76% | 32.20% | |||
2019-05-30 | PMTD | 51.69% | 36.90% | 86.29% | 75.91% | |||
2019-05-08 | Baidu-VIS | 51.66% | 37.18% | 84.62% | 30.96% | |||
2021-05-03 | NCU_MSP | 51.65% | 37.83% | 81.37% | 30.93% | |||
2019-03-23 | PMTD | 51.22% | 36.71% | 84.73% | 69.90% | |||
2019-06-02 | NJU-ImagineLab | 50.41% | 35.04% | 89.81% | 75.59% | |||
2021-03-25 | NCU_MSP | 50.15% | 36.25% | 81.33% | 29.96% | |||
2019-08-20 | juxinli | 49.70% | 35.76% | 81.45% | 66.44% | |||
2021-11-02 | fpa | 49.54% | 35.60% | 81.39% | 66.31% | |||
2022-04-22 | TextBPN++(ResNet-50 with DCN) | 49.36% | 34.92% | 84.17% | 30.76% | |||
2019-11-05 | baseline_maskrcnn | 48.55% | 34.43% | 82.30% | 60.07% | |||
2020-12-08 | cascade | 48.00% | 34.38% | 79.50% | 65.20% | |||
2019-06-11 | 4Paradigm-Data-Intelligence | 47.35% | 32.46% | 87.48% | 28.06% | |||
2019-05-23 | 4Paradigm-Data-Intelligence | 47.34% | 32.51% | 87.05% | 28.48% | |||
2018-01-22 | FOTS_v2 | 47.05% | 33.05% | 81.66% | 65.40% | |||
2020-10-16 | Drew | 46.94% | 33.24% | 79.85% | 63.57% | |||
2022-04-11 | TextBPN++(ResNet-50) | 46.72% | 33.91% | 75.09% | 26.78% | |||
2018-10-29 | Amap-CVLab | 46.39% | 32.63% | 80.22% | 66.26% | |||
2021-12-12 | a | 46.18% | 31.67% | 85.26% | 27.63% | |||
2019-07-15 | stela | 46.09% | 32.67% | 78.26% | 57.95% | |||
2018-05-18 | PSENet_NJU_ImagineLab (single-scale) | 45.98% | 32.86% | 76.53% | 25.15% | |||
2021-05-03 | adapt | 45.10% | 30.97% | 82.98% | 26.45% | |||
2021-12-12 | b | 44.99% | 30.58% | 85.08% | 26.55% | |||
2017-11-09 | EAST++ | 44.21% | 30.80% | 78.34% | 27.79% | |||
2019-09-18 | mask RCNN Augment+ | 43.89% | 31.61% | 71.78% | 48.67% | |||
2019-12-13 | BDN | 43.62% | 29.16% | 86.52% | 25.00% | |||
2021-12-31 | TextPMs | 43.52% | 30.59% | 75.42% | 23.82% | |||
2018-11-28 | CRAFT | 43.46% | 31.91% | 68.09% | 22.40% | |||
2018-11-15 | USTC-NELSLIP | 43.44% | 29.18% | 84.97% | 69.13% | |||
2017-06-28 | SCUT_DLVClab1 | 43.07% | 31.35% | 68.79% | 47.05% | |||
2020-10-21 | gccnet-ensemble | 42.81% | 28.73% | 83.96% | 58.86% | |||
2021-03-03 | NCU_MSP_light | 42.44% | 28.95% | 79.50% | 23.49% | |||
2018-12-04 | SPCNet_TongJi & UESTC (multi scale) | 41.71% | 28.26% | 79.62% | 22.37% | |||
2020-12-08 | corner | 41.36% | 28.55% | 75.01% | 54.07% | |||
2018-08-23 | Sogou_MM | 41.00% | 27.07% | 84.50% | 67.28% | |||
2018-12-22 | PKU_VDIG | 40.56% | 26.79% | 83.53% | 66.55% | |||
2021-05-17 | NCU_FPN | 40.42% | 26.75% | 82.67% | 22.65% | |||
2023-12-17 | mlt_ch_03 | 40.26% | 27.92% | 72.17% | 21.17% | |||
2018-03-12 | ATL Cangjie OCR | 40.05% | 26.58% | 81.19% | 64.38% | |||
2019-01-08 | ALGCD_CP | 39.92% | 26.57% | 80.24% | 24.09% | |||
2018-12-02 | Shape-Aware Based Scene Text Detector (single scale) | 39.66% | 26.77% | 76.49% | 20.54% | |||
2018-12-05 | EPTN-SJTU | 38.53% | 25.66% | 77.32% | 21.78% | |||
2019-05-30 | Thesis-SE | 37.74% | 25.09% | 76.08% | 21.23% | |||
2017-06-29 | SARI_FDU_RRPN_v1 | 34.74% | 22.99% | 71.06% | 51.15% | |||
2018-12-13 | AutoCV | 33.15% | 20.79% | 81.76% | 42.92% | |||
2022-01-05 | dbnet_resnet18 | 33.13% | 21.38% | 73.55% | 44.17% | |||
2018-12-03 | SPCNet_TongJi & UESTC (single scale) | 31.65% | 19.72% | 80.12% | 15.98% | |||
2017-06-28 | SARI_FDU_RRPN_v0 | 29.58% | 18.86% | 68.59% | 35.58% | |||
2017-06-30 | TH-DL | 29.11% | 20.54% | 49.94% | 30.05% | |||
2019-01-03 | YY AI OCR Group | 25.51% | 16.44% | 56.96% | 11.36% | |||
2017-06-30 | Sensetime OCR | 16.23% | 9.03% | 80.09% | 47.00% | |||
2019-10-14 | TextSnake | 5.35% | 3.07% | 20.52% | 0.64% | |||
2017-06-30 | linkage-ER-Flow | 4.81% | 2.95% | 13.05% | 0.99% |