method: TH2020-04-16

Authors: Tsinghua University and Hyundai Motor Group AIRS Company

Email: Shanyu Xiao: xiaosy19@mails.tsinghua.edu.cn

Description: We have built an end-to-end scene text spotter based on Mask R-CNN & Transformer. The ResNeXt-101 backbone and multiscale training/testing are used.

method: Sogou_OCR2019-11-08

Authors: Xudong Rao, Lulu Xu, Long Ma, Xuefeng Su

Description: An arbitrary-shaped text detection method based on Mask R-CNN, we use resnext-152 as our backbone, multi-scale training and testing are adopted to get the final results.

method: AntAI-Cognition2020-04-22

Authors: Qingpei Guo, Yudong Liu, Pengcheng Yang, Yonggang Li, Yongtao Wang, Jingdong Chen, Wei Chu

Affiliation: Ant Group & PKU

Email: qingpei.gqp@antgroup.com

Description: We are from Ant Group & PKU. Our approach is an ensemble method with three text detection models. The text detection models mainly follow the MaskRCNN framework[1], with different backbones(ResNext101-64x4d[2], CBNet[3], ResNext101-32x32d_wsl[4]) used. GBDT[5] is trained to normalize confidence scores and select quadrilateral boxes with the highest quality from all text detection models' outputs. Multi-scale training and testing are adopted for all basic models. For the training set, we also add ICDAR19 MLT datasets, both training & validation sets are used to get the final result.

[1] He K, Gkioxari G, Dollár P, et al. Mask r-cnn[C]//Proceedings of the IEEE international conference on computer vision. 2017: 2961-2969. [2] Xie S, Girshick R, Dollár P, et al. Aggregated residual transformations for deep neural networks[C]//Proceedings of the IEEE conference on computer vision and pattern recognition. 2017: 1492-1500. [3] Liu Y, Wang Y, Wang S, et al. Cbnet: A novel composite backbone network architecture for object detection[J]. arXiv preprint arXiv:1909.03625, 2019. [4] Mahajan D, Girshick R, Ramanathan V, et al. Exploring the limits of weakly supervised pretraining[C]//Proceedings of the European Conference on Computer Vision (ECCV). 2018: 181-196. [5] Ke G, Meng Q, Finley T, et al. Lightgbm: A highly efficient gradient boosting decision tree[C]//Advances in neural information processing systems. 2017: 3146-3154.

Ranking Table

Description Paper Source Code
DateMethodHmeanPrecisionRecallAverage Precision
2020-04-16TH44.92%29.49%94.22%75.64%
2019-11-08Sogou_OCR44.89%29.13%97.76%85.12%
2020-04-22 AntAI-Cognition42.78%27.46%96.66%84.29%
2018-11-20Pixel-Anchor40.29%26.10%88.29%51.88%
2019-03-29GNNets (single scale)38.92%25.45%82.71%34.04%
2019-08-08JDAI38.52%24.15%95.21%77.19%
2019-05-30PMTD38.51%24.22%93.95%82.23%
2019-05-08Baidu-VIS38.13%24.12%91.00%22.86%
2019-03-23PMTD37.55%23.71%90.18%49.86%
2017-06-28SCUT_DLVClab136.60%23.06%88.68%72.16%
2019-06-02NJU-ImagineLab36.43%22.49%95.80%82.09%
2018-10-29Amap-CVLab35.12%21.79%90.53%69.38%
2018-11-28CRAFT35.05%22.27%82.32%19.53%
2019-06-11 4Paradigm-Data-Intelligence33.95%20.71%94.15%20.21%
2019-05-234Paradigm-Data-Intelligence33.46%20.43%92.30%19.04%
2018-05-18PSENet_NJU_ImagineLab (single-scale)33.21%20.94%80.16%17.24%
2019-07-15stela32.40%20.21%81.69%60.02%
2018-11-15USTC-NELSLIP31.22%18.74%93.60%81.67%
2018-12-04 SPCNet_TongJi & UESTC (multi scale)30.98%18.66%91.16%17.08%
2019-12-13BDN30.57%18.26%93.71%18.50%
2017-11-09EAST++28.99%17.83%77.49%22.17%
2017-06-30TH-DL28.58%17.37%80.63%52.72%
2018-03-12ATL Cangjie OCR27.93%16.56%89.12%60.12%
2019-01-08ALGCD_CP27.75%16.50%87.23%17.27%
2017-06-29SARI_FDU_RRPN_v126.38%15.53%87.39%61.20%
2018-12-05EPTN-SJTU25.29%14.98%81.02%20.12%
2019-05-30Thesis-SE24.04%14.24%77.13%14.34%
2018-12-03SPCNet_TongJi & UESTC (single scale)22.24%12.62%93.56%11.97%
2017-06-28SARI_FDU_RRPN_v021.52%12.36%83.34%43.90%
2017-06-30Sensetime OCR10.32%5.46%93.44%60.68%
2017-06-30linkage-ER-Flow3.20%1.78%15.68%0.38%

Ranking Graphic

Ranking Graphic