- Task 1 - Text Localization
- Task 2 - Script identification
- Task 3 - Joint text detection and script identification
method: TH2020-04-16
Authors: Tsinghua University and Hyundai Motor Group AIRS Company
Email: Shanyu Xiao: xiaosy19@mails.tsinghua.edu.cn
Description: We have built an end-to-end scene text spotter based on Mask R-CNN & Transformer. The ResNeXt-101 backbone and multiscale training/testing are used.
method: Sogou_OCR2019-11-08
Authors: Xudong Rao, Lulu Xu, Long Ma, Xuefeng Su
Description: An arbitrary-shaped text detection method based on Mask R-CNN, we use resnext-152 as our backbone, multi-scale training and testing are adopted to get the final results.
method: AntAI-Cognition2020-04-22
Authors: Qingpei Guo, Yudong Liu, Pengcheng Yang, Yonggang Li, Yongtao Wang, Jingdong Chen, Wei Chu
Affiliation: Ant Group & PKU
Email: qingpei.gqp@antgroup.com
Description: We are from Ant Group & PKU. Our approach is an ensemble method with three text detection models. The text detection models mainly follow the MaskRCNN framework[1], with different backbones(ResNext101-64x4d[2], CBNet[3], ResNext101-32x32d_wsl[4]) used. GBDT[5] is trained to normalize confidence scores and select quadrilateral boxes with the highest quality from all text detection models' outputs. Multi-scale training and testing are adopted for all basic models. For the training set, we also add ICDAR19 MLT datasets, both training & validation sets are used to get the final result.
[1] He K, Gkioxari G, Dollár P, et al. Mask r-cnn[C]//Proceedings of the IEEE international conference on computer vision. 2017: 2961-2969. [2] Xie S, Girshick R, Dollár P, et al. Aggregated residual transformations for deep neural networks[C]//Proceedings of the IEEE conference on computer vision and pattern recognition. 2017: 1492-1500. [3] Liu Y, Wang Y, Wang S, et al. Cbnet: A novel composite backbone network architecture for object detection[J]. arXiv preprint arXiv:1909.03625, 2019. [4] Mahajan D, Girshick R, Ramanathan V, et al. Exploring the limits of weakly supervised pretraining[C]//Proceedings of the European Conference on Computer Vision (ECCV). 2018: 181-196. [5] Ke G, Meng Q, Finley T, et al. Lightgbm: A highly efficient gradient boosting decision tree[C]//Advances in neural information processing systems. 2017: 3146-3154.
Date | Method | Hmean | Precision | Recall | Average Precision | |||
---|---|---|---|---|---|---|---|---|
2020-04-16 | TH | 55.77% | 41.01% | 87.12% | 73.23% | |||
2019-11-08 | Sogou_OCR | 53.96% | 39.23% | 86.38% | 68.24% | |||
2020-04-22 | AntAI-Cognition | 52.67% | 37.81% | 86.78% | 68.83% | |||
2021-03-21 | OSKDet | 51.54% | 37.91% | 80.46% | 54.55% | |||
2023-05-22 | DeepSolo++ (ResNet-50) | 50.96% | 37.53% | 79.35% | 61.44% | |||
2019-08-08 | JDAI | 49.37% | 34.52% | 86.65% | 74.80% | |||
2019-05-30 | PMTD | 48.30% | 33.75% | 84.95% | 72.61% | |||
2024-03-14 | gts | 48.30% | 36.10% | 72.95% | 54.75% | |||
2019-06-02 | NJU-ImagineLab | 47.22% | 32.46% | 86.61% | 68.80% | |||
2019-03-29 | GNNets (single scale) | 47.15% | 33.90% | 77.45% | 51.73% | |||
2019-05-08 | Baidu-VIS | 46.60% | 32.89% | 79.96% | 26.26% | |||
2019-11-05 | baseline_maskrcnn | 46.07% | 31.92% | 82.76% | 62.72% | |||
2022-04-22 | TextBPN++(ResNet-50 with DCN) | 45.77% | 31.78% | 81.77% | 25.75% | |||
2021-05-03 | NCU_MSP | 45.65% | 32.60% | 76.15% | 24.25% | |||
2019-03-23 | PMTD | 45.64% | 32.09% | 78.99% | 52.51% | |||
2019-08-20 | juxinli | 45.47% | 32.11% | 77.83% | 52.30% | |||
2021-11-02 | fpa | 45.32% | 31.98% | 77.81% | 52.17% | |||
2018-11-20 | Pixel-Anchor | 44.36% | 32.15% | 71.54% | 27.03% | |||
2021-03-25 | NCU_MSP | 44.35% | 31.27% | 76.21% | 22.93% | |||
2020-12-08 | cascade | 44.08% | 30.95% | 76.51% | 59.75% | |||
2024-04-02 | FPDIoU | 43.84% | 34.96% | 58.76% | 20.77% | |||
2018-01-22 | FOTS_v2 | 43.40% | 29.89% | 79.20% | 41.42% | |||
2020-09-28 | DCLNet | 43.34% | 30.59% | 74.29% | 22.47% | |||
2019-03-19 | ccnet single scale | 43.18% | 30.21% | 75.68% | 45.22% | |||
2019-06-11 | 4Paradigm-Data-Intelligence | 43.11% | 29.07% | 83.37% | 24.42% | |||
2019-05-23 | 4Paradigm-Data-Intelligence | 42.96% | 29.06% | 82.32% | 24.00% | |||
2017-06-28 | SCUT_DLVClab1 | 42.20% | 29.86% | 71.92% | 52.71% | |||
2022-04-11 | TextBPN++(ResNet-50) | 42.13% | 29.59% | 73.12% | 21.00% | |||
2018-10-29 | Amap-CVLab | 41.86% | 28.52% | 78.63% | 51.64% | |||
2019-12-13 | BDN | 41.08% | 26.95% | 86.32% | 23.12% | |||
2020-10-16 | Drew | 40.54% | 28.11% | 72.67% | 50.25% | |||
2018-11-28 | CRAFT | 40.43% | 28.37% | 70.34% | 19.96% | |||
2021-12-12 | a | 40.30% | 27.03% | 79.16% | 20.92% | |||
2021-05-03 | adapt | 40.12% | 26.89% | 78.91% | 20.42% | |||
2017-11-09 | EAST++ | 40.08% | 27.29% | 75.47% | 34.10% | |||
2021-12-12 | b | 39.99% | 26.71% | 79.60% | 20.55% | |||
2023-12-17 | mlt_ch_03 | 39.87% | 26.87% | 77.26% | 20.46% | |||
2018-05-18 | PSENet_NJU_ImagineLab (single-scale) | 39.63% | 27.08% | 73.87% | 20.21% | |||
2021-12-31 | TextPMs | 38.89% | 26.48% | 73.16% | 19.02% | |||
2021-03-03 | NCU_MSP_light | 38.50% | 25.70% | 76.78% | 19.07% | |||
2019-09-18 | mask RCNN Augment+ | 38.35% | 26.80% | 67.41% | 41.17% | |||
2020-10-21 | gccnet-ensemble | 38.21% | 25.16% | 79.43% | 56.05% | |||
2018-11-15 | USTC-NELSLIP | 38.09% | 24.99% | 80.04% | 46.23% | |||
2020-12-08 | corner | 37.89% | 25.57% | 73.12% | 52.25% | |||
2018-12-22 | PKU_VDIG | 37.80% | 24.42% | 83.58% | 65.44% | |||
2021-05-17 | NCU_FPN | 37.07% | 24.05% | 80.86% | 18.85% | |||
2018-12-02 | Shape-Aware Based Scene Text Detector (single scale) | 36.42% | 24.01% | 75.41% | 18.22% | |||
2019-01-08 | ALGCD_CP | 36.41% | 23.88% | 76.63% | 32.94% | |||
2018-08-23 | Sogou_MM | 36.00% | 23.34% | 78.67% | 57.38% | |||
2018-12-04 | SPCNet_TongJi & UESTC (multi scale) | 35.96% | 23.95% | 72.19% | 17.42% | |||
2018-12-05 | EPTN-SJTU | 33.30% | 21.52% | 73.52% | 33.46% | |||
2019-05-30 | Thesis-SE | 32.77% | 21.27% | 71.39% | 30.12% | |||
2018-03-12 | ATL Cangjie OCR | 32.11% | 20.84% | 69.98% | 35.50% | |||
2019-07-15 | stela | 32.07% | 22.31% | 57.05% | 28.65% | |||
2017-06-29 | SARI_FDU_RRPN_v1 | 30.15% | 19.23% | 69.71% | 40.70% | |||
2018-12-13 | AutoCV | 29.06% | 17.81% | 78.78% | 36.49% | |||
2018-12-03 | SPCNet_TongJi & UESTC (single scale) | 27.97% | 17.16% | 75.68% | 13.15% | |||
2022-01-05 | dbnet_resnet18 | 26.67% | 16.63% | 67.41% | 26.29% | |||
2017-06-28 | SARI_FDU_RRPN_v0 | 25.27% | 15.60% | 66.48% | 36.44% | |||
2017-06-30 | TH-DL | 25.09% | 16.98% | 48.08% | 20.81% | |||
2019-01-03 | YY AI OCR Group | 23.12% | 14.40% | 58.51% | 15.26% | |||
2017-06-30 | linkage-ER-Flow | 17.34% | 10.30% | 54.72% | 12.37% | |||
2017-06-30 | Sensetime OCR | 14.98% | 8.27% | 80.04% | 40.38% | |||
2019-10-14 | TextSnake | 9.34% | 5.28% | 40.34% | 2.11% |