method: TH2020-04-16

Authors: Tsinghua University and Hyundai Motor Group AIRS Company

Email: Shanyu Xiao: xiaosy19@mails.tsinghua.edu.cn

Description: We have built an end-to-end scene text spotter based on Mask R-CNN & Transformer. The ResNeXt-101 backbone and multiscale training/testing are used.

method: Sogou_OCR2019-11-08

Authors: Xudong Rao, Lulu Xu, Long Ma, Xuefeng Su

Description: An arbitrary-shaped text detection method based on Mask R-CNN, we use resnext-152 as our backbone, multi-scale training and testing are adopted to get the final results.

method: AntAI-Cognition2020-04-22

Authors: Qingpei Guo, Yudong Liu, Pengcheng Yang, Yonggang Li, Yongtao Wang, Jingdong Chen, Wei Chu

Affiliation: Ant Group & PKU

Email: qingpei.gqp@antgroup.com

Description: We are from Ant Group & PKU. Our approach is an ensemble method with three text detection models. The text detection models mainly follow the MaskRCNN framework[1], with different backbones(ResNext101-64x4d[2], CBNet[3], ResNext101-32x32d_wsl[4]) used. GBDT[5] is trained to normalize confidence scores and select quadrilateral boxes with the highest quality from all text detection models' outputs. Multi-scale training and testing are adopted for all basic models. For the training set, we also add ICDAR19 MLT datasets, both training & validation sets are used to get the final result.

[1] He K, Gkioxari G, Dollár P, et al. Mask r-cnn[C]//Proceedings of the IEEE international conference on computer vision. 2017: 2961-2969. [2] Xie S, Girshick R, Dollár P, et al. Aggregated residual transformations for deep neural networks[C]//Proceedings of the IEEE conference on computer vision and pattern recognition. 2017: 1492-1500. [3] Liu Y, Wang Y, Wang S, et al. Cbnet: A novel composite backbone network architecture for object detection[J]. arXiv preprint arXiv:1909.03625, 2019. [4] Mahajan D, Girshick R, Ramanathan V, et al. Exploring the limits of weakly supervised pretraining[C]//Proceedings of the European Conference on Computer Vision (ECCV). 2018: 181-196. [5] Ke G, Meng Q, Finley T, et al. Lightgbm: A highly efficient gradient boosting decision tree[C]//Advances in neural information processing systems. 2017: 3146-3154.

Ranking Table

Description Paper Source Code
DateMethodHmeanPrecisionRecallAverage Precision
2020-04-16TH58.66%49.57%71.82%49.46%
2019-11-08Sogou_OCR56.69%47.66%69.94%47.27%
2020-04-22 AntAI-Cognition56.55%46.52%72.10%46.56%
2021-03-21OSKDet54.65%46.16%66.97%47.35%
2019-05-08Baidu-VIS53.38%42.87%70.72%29.94%
2019-05-30PMTD53.34%42.54%71.51%49.93%
2019-08-08JDAI53.26%42.56%71.13%50.82%
2019-06-02NJU-ImagineLab52.80%41.06%73.94%49.97%
2019-03-23PMTD50.87%40.87%67.37%45.30%
2024-03-14gts49.81%43.65%58.01%38.40%
2019-11-05baseline_maskrcnn49.44%39.46%66.18%41.30%
2021-05-03NCU_MSP49.43%40.43%63.59%25.46%
2019-06-11 4Paradigm-Data-Intelligence49.41%37.84%71.18%26.31%
2019-08-20juxinli48.99%39.66%64.07%43.08%
2019-05-234Paradigm-Data-Intelligence48.88%37.61%69.78%25.79%
2021-11-02fpa48.83%39.48%63.97%42.91%
2021-03-25 NCU_MSP48.36%39.02%63.55%24.40%
2018-11-20Pixel-Anchor47.93%40.71%58.24%22.48%
2022-04-22TextBPN++(ResNet-50 with DCN)47.71%38.01%64.06%24.39%
2019-03-29GNNets (single scale)46.72%38.47%59.46%30.88%
2018-11-28CRAFT46.15%37.37%60.33%22.35%
2019-12-13BDN46.05%34.06%71.03%23.70%
2023-05-22DeepSolo++ (ResNet-50)45.81%40.74%52.32%33.27%
2020-09-28DCLNet45.59%37.12%59.08%22.03%
2020-12-08cascade45.57%37.18%58.85%36.89%
2021-12-12a45.34%34.44%66.35%22.94%
2018-10-29Amap-CVLab44.87%35.48%61.00%30.08%
2021-05-03adapt44.77%33.94%65.73%21.86%
2019-03-19ccnet single scale44.47%36.19%57.66%28.13%
2018-11-15USTC-NELSLIP44.42%32.85%68.55%38.69%
2021-12-12b44.41%33.56%65.64%21.87%
2020-10-16Drew44.31%35.67%58.47%33.52%
2022-04-11TextBPN++(ResNet-50)44.27%36.15%57.09%21.03%
2021-12-31TextPMs43.99%34.67%60.14%20.99%
2024-04-02FPDIoU43.95%44.50%43.41%20.19%
2023-12-17mlt_ch_0343.73%33.68%62.30%21.02%
2017-11-09EAST++43.15%33.57%60.37%27.28%
2021-03-03NCU_MSP_light42.95%32.62%62.87%20.29%
2021-05-17NCU_FPN42.75%31.01%68.79%20.94%
2018-12-22PKU_VDIG42.49%30.85%68.22%40.60%
2018-01-22FOTS_v241.99%33.62%55.89%24.21%
2020-12-08corner41.30%31.94%58.42%33.80%
2018-05-18PSENet_NJU_ImagineLab (single-scale)41.03%31.96%57.29%17.80%
2020-10-21gccnet-ensemble40.96%30.31%63.11%29.84%
2018-12-04 SPCNet_TongJi & UESTC (multi scale)40.84%31.29%58.81%17.97%
2019-01-08ALGCD_CP40.45%30.10%61.65%26.49%
2018-08-23Sogou_MM39.59%29.24%61.32%34.74%
2019-07-15stela39.20%31.46%51.99%25.52%
2018-03-12ATL Cangjie OCR38.91%28.76%60.12%31.21%
2018-12-02Shape-Aware Based Scene Text Detector (single scale)38.02%28.44%57.34%15.95%
2019-09-18mask RCNN Augment+37.63%31.29%47.21%23.55%
2017-06-28SCUT_DLVClab137.02%31.48%44.93%25.34%
2019-05-30Thesis-SE34.72%25.80%53.07%21.64%
2018-12-05EPTN-SJTU34.48%25.57%52.91%21.71%
2018-12-13AutoCV33.27%22.65%62.61%27.75%
2018-12-03SPCNet_TongJi & UESTC (single scale)30.87%21.16%57.04%11.89%
2017-06-29SARI_FDU_RRPN_v130.72%22.58%48.02%19.88%
2017-06-28SARI_FDU_RRPN_v028.73%19.91%51.53%24.29%
2022-01-05dbnet_resnet1826.56%18.74%45.57%13.99%
2019-01-03YY AI OCR Group21.62%15.44%36.05%8.07%
2017-06-30TH-DL20.20%16.53%25.97%9.24%
2017-06-30Sensetime OCR18.68%10.93%64.03%27.49%
2017-06-30linkage-ER-Flow18.52%12.13%39.18%6.15%
2019-10-14TextSnake6.00%3.69%16.05%0.60%

Ranking Graphic

Ranking Graphic