- Task 1 - Text Localization
- Task 2 - Script identification
- Task 3 - Joint text detection and script identification
method: TH2020-04-16
Authors: Tsinghua University and Hyundai Motor Group AIRS Company
Email: Shanyu Xiao: xiaosy19@mails.tsinghua.edu.cn
Description: We have built an end-to-end scene text spotter based on Mask R-CNN & Transformer. The ResNeXt-101 backbone and multiscale training/testing are used.
method: Sogou_OCR2019-11-08
Authors: Xudong Rao, Lulu Xu, Long Ma, Xuefeng Su
Description: An arbitrary-shaped text detection method based on Mask R-CNN, we use resnext-152 as our backbone, multi-scale training and testing are adopted to get the final results.
method: AntAI-Cognition2020-04-22
Authors: Qingpei Guo, Yudong Liu, Pengcheng Yang, Yonggang Li, Yongtao Wang, Jingdong Chen, Wei Chu
Affiliation: Ant Group & PKU
Email: qingpei.gqp@antgroup.com
Description: We are from Ant Group & PKU. Our approach is an ensemble method with three text detection models. The text detection models mainly follow the MaskRCNN framework[1], with different backbones(ResNext101-64x4d[2], CBNet[3], ResNext101-32x32d_wsl[4]) used. GBDT[5] is trained to normalize confidence scores and select quadrilateral boxes with the highest quality from all text detection models' outputs. Multi-scale training and testing are adopted for all basic models. For the training set, we also add ICDAR19 MLT datasets, both training & validation sets are used to get the final result.
[1] He K, Gkioxari G, Dollár P, et al. Mask r-cnn[C]//Proceedings of the IEEE international conference on computer vision. 2017: 2961-2969. [2] Xie S, Girshick R, Dollár P, et al. Aggregated residual transformations for deep neural networks[C]//Proceedings of the IEEE conference on computer vision and pattern recognition. 2017: 1492-1500. [3] Liu Y, Wang Y, Wang S, et al. Cbnet: A novel composite backbone network architecture for object detection[J]. arXiv preprint arXiv:1909.03625, 2019. [4] Mahajan D, Girshick R, Ramanathan V, et al. Exploring the limits of weakly supervised pretraining[C]//Proceedings of the European Conference on Computer Vision (ECCV). 2018: 181-196. [5] Ke G, Meng Q, Finley T, et al. Lightgbm: A highly efficient gradient boosting decision tree[C]//Advances in neural information processing systems. 2017: 3146-3154.
Date | Method | Hmean | Precision | Recall | Average Precision | |||
---|---|---|---|---|---|---|---|---|
2020-04-16 | TH | 44.92% | 29.49% | 94.22% | 75.64% | |||
2019-11-08 | Sogou_OCR | 44.89% | 29.13% | 97.76% | 85.12% | |||
2020-04-22 | AntAI-Cognition | 42.78% | 27.46% | 96.66% | 84.29% | |||
2018-11-20 | Pixel-Anchor | 40.29% | 26.10% | 88.29% | 51.88% | |||
2019-03-29 | GNNets (single scale) | 38.92% | 25.45% | 82.71% | 34.04% | |||
2019-08-08 | JDAI | 38.52% | 24.15% | 95.21% | 77.19% | |||
2019-05-30 | PMTD | 38.51% | 24.22% | 93.95% | 82.23% | |||
2019-05-08 | Baidu-VIS | 38.13% | 24.12% | 91.00% | 22.86% | |||
2019-03-23 | PMTD | 37.55% | 23.71% | 90.18% | 49.86% | |||
2017-06-28 | SCUT_DLVClab1 | 36.60% | 23.06% | 88.68% | 72.16% | |||
2019-06-02 | NJU-ImagineLab | 36.43% | 22.49% | 95.80% | 82.09% | |||
2018-10-29 | Amap-CVLab | 35.12% | 21.79% | 90.53% | 69.38% | |||
2018-11-28 | CRAFT | 35.05% | 22.27% | 82.32% | 19.53% | |||
2019-06-11 | 4Paradigm-Data-Intelligence | 33.95% | 20.71% | 94.15% | 20.21% | |||
2019-05-23 | 4Paradigm-Data-Intelligence | 33.46% | 20.43% | 92.30% | 19.04% | |||
2018-05-18 | PSENet_NJU_ImagineLab (single-scale) | 33.21% | 20.94% | 80.16% | 17.24% | |||
2019-07-15 | stela | 32.40% | 20.21% | 81.69% | 60.02% | |||
2018-11-15 | USTC-NELSLIP | 31.22% | 18.74% | 93.60% | 81.67% | |||
2018-12-04 | SPCNet_TongJi & UESTC (multi scale) | 30.98% | 18.66% | 91.16% | 17.08% | |||
2019-12-13 | BDN | 30.57% | 18.26% | 93.71% | 18.50% | |||
2017-11-09 | EAST++ | 28.99% | 17.83% | 77.49% | 22.17% | |||
2017-06-30 | TH-DL | 28.58% | 17.37% | 80.63% | 52.72% | |||
2018-03-12 | ATL Cangjie OCR | 27.93% | 16.56% | 89.12% | 60.12% | |||
2019-01-08 | ALGCD_CP | 27.75% | 16.50% | 87.23% | 17.27% | |||
2017-06-29 | SARI_FDU_RRPN_v1 | 26.38% | 15.53% | 87.39% | 61.20% | |||
2018-12-05 | EPTN-SJTU | 25.29% | 14.98% | 81.02% | 20.12% | |||
2019-05-30 | Thesis-SE | 24.04% | 14.24% | 77.13% | 14.34% | |||
2018-12-03 | SPCNet_TongJi & UESTC (single scale) | 22.24% | 12.62% | 93.56% | 11.97% | |||
2017-06-28 | SARI_FDU_RRPN_v0 | 21.52% | 12.36% | 83.34% | 43.90% | |||
2017-06-30 | Sensetime OCR | 10.32% | 5.46% | 93.44% | 60.68% | |||
2017-06-30 | linkage-ER-Flow | 3.20% | 1.78% | 15.68% | 0.38% |