- Task 1 - Text Localization
- Task 2 - Script identification
- Task 3 - Joint text detection and script identification
method: TH2020-04-16
Authors: Tsinghua University and Hyundai Motor Group AIRS Company
Email: Shanyu Xiao: xiaosy19@mails.tsinghua.edu.cn
Description: We have built an end-to-end scene text spotter based on Mask R-CNN & Transformer. The ResNeXt-101 backbone and multiscale training/testing are used.
method: Sogou_OCR2019-11-08
Authors: Xudong Rao, Lulu Xu, Long Ma, Xuefeng Su
Description: An arbitrary-shaped text detection method based on Mask R-CNN, we use resnext-152 as our backbone, multi-scale training and testing are adopted to get the final results.
method: AntAI-Cognition2020-04-22
Authors: Qingpei Guo, Yudong Liu, Pengcheng Yang, Yonggang Li, Yongtao Wang, Jingdong Chen, Wei Chu
Affiliation: Ant Group & PKU
Email: qingpei.gqp@antgroup.com
Description: We are from Ant Group & PKU. Our approach is an ensemble method with three text detection models. The text detection models mainly follow the MaskRCNN framework[1], with different backbones(ResNext101-64x4d[2], CBNet[3], ResNext101-32x32d_wsl[4]) used. GBDT[5] is trained to normalize confidence scores and select quadrilateral boxes with the highest quality from all text detection models' outputs. Multi-scale training and testing are adopted for all basic models. For the training set, we also add ICDAR19 MLT datasets, both training & validation sets are used to get the final result.
[1] He K, Gkioxari G, Dollár P, et al. Mask r-cnn[C]//Proceedings of the IEEE international conference on computer vision. 2017: 2961-2969. [2] Xie S, Girshick R, Dollár P, et al. Aggregated residual transformations for deep neural networks[C]//Proceedings of the IEEE conference on computer vision and pattern recognition. 2017: 1492-1500. [3] Liu Y, Wang Y, Wang S, et al. Cbnet: A novel composite backbone network architecture for object detection[J]. arXiv preprint arXiv:1909.03625, 2019. [4] Mahajan D, Girshick R, Ramanathan V, et al. Exploring the limits of weakly supervised pretraining[C]//Proceedings of the European Conference on Computer Vision (ECCV). 2018: 181-196. [5] Ke G, Meng Q, Finley T, et al. Lightgbm: A highly efficient gradient boosting decision tree[C]//Advances in neural information processing systems. 2017: 3146-3154.
Date | Method | Hmean | Precision | Recall | Average Precision | |||
---|---|---|---|---|---|---|---|---|
2020-04-16 | TH | 70.94% | 62.53% | 81.97% | 67.52% | |||
2019-11-08 | Sogou_OCR | 69.37% | 60.97% | 80.46% | 65.51% | |||
2020-04-22 | AntAI-Cognition | 68.30% | 59.02% | 81.04% | 56.02% | |||
2021-03-21 | OSKDet | 67.10% | 59.52% | 76.89% | 63.68% | |||
2019-05-30 | PMTD | 64.77% | 55.26% | 78.23% | 63.79% | |||
2019-08-08 | JDAI | 64.38% | 55.33% | 76.96% | 61.58% | |||
2019-05-08 | Baidu-VIS | 64.31% | 55.42% | 76.59% | 41.52% | |||
2023-05-22 | DeepSolo++ (ResNet-50) | 63.06% | 57.33% | 70.07% | 56.41% | |||
2019-03-23 | PMTD | 63.05% | 53.85% | 76.05% | 61.26% | |||
2019-11-05 | baseline_maskrcnn | 62.72% | 53.14% | 76.52% | 59.67% | |||
2020-09-28 | DCLNet | 62.38% | 53.15% | 75.48% | 40.36% | |||
2019-08-20 | juxinli | 62.26% | 53.63% | 74.20% | 59.85% | |||
2019-06-11 | 4Paradigm-Data-Intelligence | 62.20% | 50.86% | 80.06% | 39.66% | |||
2021-11-02 | fpa | 62.15% | 53.49% | 74.17% | 59.74% | |||
2021-05-03 | NCU_MSP | 61.95% | 54.25% | 72.20% | 39.75% | |||
2022-04-22 | TextBPN++(ResNet-50 with DCN) | 61.79% | 51.90% | 76.32% | 40.10% | |||
2019-06-02 | NJU-ImagineLab | 61.30% | 51.50% | 75.70% | 58.97% | |||
2019-05-23 | 4Paradigm-Data-Intelligence | 61.04% | 50.33% | 77.53% | 37.92% | |||
2021-03-25 | NCU_MSP | 60.86% | 52.74% | 71.93% | 38.40% | |||
2021-12-12 | a | 58.54% | 47.82% | 75.46% | 36.26% | |||
2022-04-11 | TextBPN++(ResNet-50) | 58.53% | 50.16% | 70.25% | 35.89% | |||
2021-12-31 | TextPMs | 57.74% | 48.46% | 71.44% | 35.14% | |||
2020-12-08 | cascade | 57.71% | 50.22% | 67.83% | 50.04% | |||
2019-03-19 | ccnet single scale | 57.62% | 49.65% | 68.63% | 47.01% | |||
2021-12-12 | b | 57.40% | 46.64% | 74.62% | 34.91% | |||
2021-05-03 | adapt | 56.87% | 46.75% | 72.57% | 34.16% | |||
2020-10-16 | Drew | 56.74% | 48.75% | 67.86% | 51.43% | |||
2018-11-20 | Pixel-Anchor | 56.66% | 50.77% | 64.09% | 34.21% | |||
2019-03-29 | GNNets (single scale) | 56.54% | 49.30% | 66.29% | 44.67% | |||
2019-12-13 | BDN | 56.27% | 44.91% | 75.30% | 32.80% | |||
2018-10-29 | Amap-CVLab | 56.25% | 47.01% | 70.00% | 49.59% | |||
2018-11-15 | USTC-NELSLIP | 55.58% | 43.95% | 75.59% | 60.29% | |||
2018-01-22 | FOTS_v2 | 55.33% | 47.58% | 66.08% | 46.29% | |||
2018-11-28 | CRAFT | 55.23% | 48.07% | 64.89% | 30.47% | |||
2021-03-03 | NCU_MSP_light | 55.10% | 45.57% | 69.64% | 32.14% | |||
2018-05-18 | PSENet_NJU_ImagineLab (single-scale) | 54.57% | 45.44% | 68.28% | 30.21% | |||
2021-05-17 | NCU_FPN | 53.91% | 42.62% | 73.33% | 31.40% | |||
2018-12-22 | PKU_VDIG | 53.84% | 41.91% | 75.29% | 54.46% | |||
2019-07-15 | stela | 53.32% | 45.31% | 64.76% | 45.50% | |||
2018-03-12 | ATL Cangjie OCR | 53.18% | 42.20% | 71.88% | 56.17% | |||
2020-12-08 | corner | 52.36% | 43.72% | 65.26% | 46.69% | |||
2018-12-04 | SPCNet_TongJi & UESTC (multi scale) | 52.06% | 43.26% | 65.38% | 27.39% | |||
2017-06-28 | SCUT_DLVClab1 | 51.27% | 45.80% | 58.24% | 38.59% | |||
2020-10-21 | gccnet-ensemble | 51.26% | 41.40% | 67.26% | 42.97% | |||
2017-11-09 | EAST++ | 51.07% | 43.27% | 62.30% | 29.75% | |||
2019-01-08 | ALGCD_CP | 50.60% | 41.17% | 65.65% | 27.30% | |||
2018-08-23 | Sogou_MM | 50.35% | 40.40% | 66.81% | 44.72% | |||
2019-09-18 | mask RCNN Augment+ | 50.23% | 45.24% | 56.46% | 36.66% | |||
2018-12-02 | Shape-Aware Based Scene Text Detector (single scale) | 49.96% | 40.58% | 65.00% | 25.68% | |||
2019-05-30 | Thesis-SE | 45.58% | 37.34% | 58.49% | 23.54% | |||
2018-12-13 | AutoCV | 45.01% | 33.36% | 69.15% | 41.95% | |||
2018-12-03 | SPCNet_TongJi & UESTC (single scale) | 44.38% | 32.88% | 68.23% | 21.78% | |||
2018-12-05 | EPTN-SJTU | 43.88% | 36.03% | 56.10% | 21.90% | |||
2017-06-29 | SARI_FDU_RRPN_v1 | 39.40% | 32.19% | 50.76% | 24.45% | |||
2022-01-05 | dbnet_resnet18 | 38.87% | 29.97% | 55.30% | 29.60% | |||
2017-06-28 | SARI_FDU_RRPN_v0 | 38.55% | 28.94% | 57.72% | 30.92% | |||
2017-06-30 | TH-DL | 32.73% | 30.16% | 35.78% | 20.68% | |||
2019-01-03 | YY AI OCR Group | 31.43% | 25.24% | 41.65% | 13.07% | |||
2017-06-30 | Sensetime OCR | 27.39% | 17.11% | 68.68% | 44.78% | |||
2017-06-30 | linkage-ER-Flow | 14.40% | 10.71% | 21.99% | 4.02% | |||
2019-10-14 | TextSnake | 6.15% | 4.19% | 11.55% | 0.50% |