method: TH2020-04-16

Authors: Tsinghua University and Hyundai Motor Group AIRS Company

Email: Shanyu Xiao: xiaosy19@mails.tsinghua.edu.cn

Description: We have built an end-to-end scene text spotter based on Mask R-CNN & Transformer. The ResNeXt-101 backbone and multiscale training/testing are used.

method: Sogou_OCR2019-11-08

Authors: Xudong Rao, Lulu Xu, Long Ma, Xuefeng Su

Description: An arbitrary-shaped text detection method based on Mask R-CNN, we use resnext-152 as our backbone, multi-scale training and testing are adopted to get the final results.

method: AntAI-Cognition2020-04-22

Authors: Qingpei Guo, Yudong Liu, Pengcheng Yang, Yonggang Li, Yongtao Wang, Jingdong Chen, Wei Chu

Affiliation: Ant Group & PKU

Email: qingpei.gqp@antgroup.com

Description: We are from Ant Group & PKU. Our approach is an ensemble method with three text detection models. The text detection models mainly follow the MaskRCNN framework[1], with different backbones(ResNext101-64x4d[2], CBNet[3], ResNext101-32x32d_wsl[4]) used. GBDT[5] is trained to normalize confidence scores and select quadrilateral boxes with the highest quality from all text detection models' outputs. Multi-scale training and testing are adopted for all basic models. For the training set, we also add ICDAR19 MLT datasets, both training & validation sets are used to get the final result.

[1] He K, Gkioxari G, Dollár P, et al. Mask r-cnn[C]//Proceedings of the IEEE international conference on computer vision. 2017: 2961-2969. [2] Xie S, Girshick R, Dollár P, et al. Aggregated residual transformations for deep neural networks[C]//Proceedings of the IEEE conference on computer vision and pattern recognition. 2017: 1492-1500. [3] Liu Y, Wang Y, Wang S, et al. Cbnet: A novel composite backbone network architecture for object detection[J]. arXiv preprint arXiv:1909.03625, 2019. [4] Mahajan D, Girshick R, Ramanathan V, et al. Exploring the limits of weakly supervised pretraining[C]//Proceedings of the European Conference on Computer Vision (ECCV). 2018: 181-196. [5] Ke G, Meng Q, Finley T, et al. Lightgbm: A highly efficient gradient boosting decision tree[C]//Advances in neural information processing systems. 2017: 3146-3154.

Ranking Table

Description Paper Source Code
DateMethodHmeanPrecisionRecallAverage Precision
2020-04-16TH55.77%41.01%87.12%73.23%
2019-11-08Sogou_OCR53.96%39.23%86.38%68.24%
2020-04-22 AntAI-Cognition52.67%37.81%86.78%68.83%
2019-08-08JDAI49.37%34.52%86.65%74.80%
2019-05-30PMTD48.30%33.75%84.95%72.61%
2019-06-02NJU-ImagineLab47.22%32.46%86.61%68.80%
2019-03-29GNNets (single scale)47.15%33.90%77.45%51.73%
2019-05-08Baidu-VIS46.60%32.89%79.96%26.26%
2019-03-23PMTD45.64%32.09%78.99%52.51%
2018-11-20Pixel-Anchor44.36%32.15%71.54%27.03%
2019-06-11 4Paradigm-Data-Intelligence43.11%29.07%83.37%24.42%
2019-05-234Paradigm-Data-Intelligence42.96%29.06%82.32%24.00%
2017-06-28SCUT_DLVClab142.20%29.86%71.92%52.71%
2018-10-29Amap-CVLab41.86%28.52%78.63%51.64%
2019-12-13BDN41.08%26.95%86.32%23.12%
2018-11-28CRAFT40.43%28.37%70.34%19.96%
2017-11-09EAST++40.08%27.29%75.47%34.10%
2018-05-18PSENet_NJU_ImagineLab (single-scale)39.63%27.08%73.87%20.21%
2018-11-15USTC-NELSLIP38.09%24.99%80.04%46.23%
2019-01-08ALGCD_CP36.41%23.88%76.63%32.94%
2018-12-04 SPCNet_TongJi & UESTC (multi scale)35.96%23.95%72.19%17.42%
2018-12-05EPTN-SJTU33.30%21.52%73.52%33.46%
2019-05-30Thesis-SE32.77%21.27%71.39%30.12%
2018-03-12ATL Cangjie OCR32.11%20.84%69.98%35.50%
2019-07-15stela32.07%22.31%57.05%28.65%
2017-06-29SARI_FDU_RRPN_v130.15%19.23%69.71%40.70%
2018-12-03SPCNet_TongJi & UESTC (single scale)27.97%17.16%75.68%13.15%
2017-06-28SARI_FDU_RRPN_v025.27%15.60%66.48%36.44%
2017-06-30TH-DL25.09%16.98%48.08%20.81%
2017-06-30linkage-ER-Flow17.34%10.30%54.72%12.37%
2017-06-30Sensetime OCR14.98%8.27%80.04%40.38%

Ranking Graphic

Ranking Graphic