method: TH2020-04-16

Authors: Tsinghua University and Hyundai Motor Group AIRS Company

Email: Shanyu Xiao: xiaosy19@mails.tsinghua.edu.cn

Description: We have built an end-to-end scene text spotter based on Mask R-CNN & Transformer. The ResNeXt-101 backbone and multiscale training/testing are used.

method: Sogou_OCR2019-11-08

Authors: Xudong Rao, Lulu Xu, Long Ma, Xuefeng Su

Description: An arbitrary-shaped text detection method based on Mask R-CNN, we use resnext-152 as our backbone, multi-scale training and testing are adopted to get the final results.

method: AntAI-Cognition2020-04-22

Authors: Qingpei Guo, Yudong Liu, Pengcheng Yang, Yonggang Li, Yongtao Wang, Jingdong Chen, Wei Chu

Affiliation: Ant Group & PKU

Email: qingpei.gqp@antgroup.com

Description: We are from Ant Group & PKU. Our approach is an ensemble method with three text detection models. The text detection models mainly follow the MaskRCNN framework[1], with different backbones(ResNext101-64x4d[2], CBNet[3], ResNext101-32x32d_wsl[4]) used. GBDT[5] is trained to normalize confidence scores and select quadrilateral boxes with the highest quality from all text detection models' outputs. Multi-scale training and testing are adopted for all basic models. For the training set, we also add ICDAR19 MLT datasets, both training & validation sets are used to get the final result.

[1] He K, Gkioxari G, Dollár P, et al. Mask r-cnn[C]//Proceedings of the IEEE international conference on computer vision. 2017: 2961-2969. [2] Xie S, Girshick R, Dollár P, et al. Aggregated residual transformations for deep neural networks[C]//Proceedings of the IEEE conference on computer vision and pattern recognition. 2017: 1492-1500. [3] Liu Y, Wang Y, Wang S, et al. Cbnet: A novel composite backbone network architecture for object detection[J]. arXiv preprint arXiv:1909.03625, 2019. [4] Mahajan D, Girshick R, Ramanathan V, et al. Exploring the limits of weakly supervised pretraining[C]//Proceedings of the European Conference on Computer Vision (ECCV). 2018: 181-196. [5] Ke G, Meng Q, Finley T, et al. Lightgbm: A highly efficient gradient boosting decision tree[C]//Advances in neural information processing systems. 2017: 3146-3154.

Ranking Table

Description Paper Source Code
DateMethodHmeanPrecisionRecallAverage Precision
2020-04-16TH59.17%44.09%89.91%79.49%
2019-11-08Sogou_OCR57.05%42.33%87.46%68.55%
2020-04-22 AntAI-Cognition56.68%41.26%90.51%74.90%
2024-04-02FPDIoU56.62%46.96%71.28%32.79%
2021-03-21OSKDet56.12%42.00%84.54%70.22%
2023-05-22DeepSolo++ (ResNet-50)55.30%41.29%83.70%74.98%
2019-03-29GNNets (single scale)52.80%39.41%79.97%47.89%
2018-11-20Pixel-Anchor52.70%39.01%81.16%42.19%
2019-08-08JDAI52.31%37.11%88.60%73.66%
2019-03-19ccnet single scale52.16%37.49%85.71%68.08%
2024-03-14gts52.12%39.47%76.72%64.64%
2020-09-28DCLNet51.90%37.60%83.76%32.20%
2019-05-30PMTD51.69%36.90%86.29%75.91%
2019-05-08Baidu-VIS51.66%37.18%84.62%30.96%
2021-05-03NCU_MSP51.65%37.83%81.37%30.93%
2019-03-23PMTD51.22%36.71%84.73%69.90%
2019-06-02NJU-ImagineLab50.41%35.04%89.81%75.59%
2021-03-25 NCU_MSP50.15%36.25%81.33%29.96%
2019-08-20juxinli49.70%35.76%81.45%66.44%
2021-11-02fpa49.54%35.60%81.39%66.31%
2022-04-22TextBPN++(ResNet-50 with DCN)49.36%34.92%84.17%30.76%
2019-11-05baseline_maskrcnn48.55%34.43%82.30%60.07%
2020-12-08cascade48.00%34.38%79.50%65.20%
2019-06-11 4Paradigm-Data-Intelligence47.35%32.46%87.48%28.06%
2019-05-234Paradigm-Data-Intelligence47.34%32.51%87.05%28.48%
2018-01-22FOTS_v247.05%33.05%81.66%65.40%
2020-10-16Drew46.94%33.24%79.85%63.57%
2022-04-11TextBPN++(ResNet-50)46.72%33.91%75.09%26.78%
2018-10-29Amap-CVLab46.39%32.63%80.22%66.26%
2021-12-12a46.18%31.67%85.26%27.63%
2019-07-15stela46.09%32.67%78.26%57.95%
2018-05-18PSENet_NJU_ImagineLab (single-scale)45.98%32.86%76.53%25.15%
2021-05-03adapt45.10%30.97%82.98%26.45%
2021-12-12b44.99%30.58%85.08%26.55%
2017-11-09EAST++44.21%30.80%78.34%27.79%
2019-09-18mask RCNN Augment+43.89%31.61%71.78%48.67%
2019-12-13BDN43.62%29.16%86.52%25.00%
2021-12-31TextPMs43.52%30.59%75.42%23.82%
2018-11-28CRAFT43.46%31.91%68.09%22.40%
2018-11-15USTC-NELSLIP43.44%29.18%84.97%69.13%
2017-06-28SCUT_DLVClab143.07%31.35%68.79%47.05%
2020-10-21gccnet-ensemble42.81%28.73%83.96%58.86%
2021-03-03NCU_MSP_light42.44%28.95%79.50%23.49%
2018-12-04 SPCNet_TongJi & UESTC (multi scale)41.71%28.26%79.62%22.37%
2020-12-08corner41.36%28.55%75.01%54.07%
2018-08-23Sogou_MM41.00%27.07%84.50%67.28%
2018-12-22PKU_VDIG40.56%26.79%83.53%66.55%
2021-05-17NCU_FPN40.42%26.75%82.67%22.65%
2023-12-17mlt_ch_0340.26%27.92%72.17%21.17%
2018-03-12ATL Cangjie OCR40.05%26.58%81.19%64.38%
2019-01-08ALGCD_CP39.92%26.57%80.24%24.09%
2018-12-02Shape-Aware Based Scene Text Detector (single scale)39.66%26.77%76.49%20.54%
2018-12-05EPTN-SJTU38.53%25.66%77.32%21.78%
2019-05-30Thesis-SE37.74%25.09%76.08%21.23%
2017-06-29SARI_FDU_RRPN_v134.74%22.99%71.06%51.15%
2018-12-13AutoCV33.15%20.79%81.76%42.92%
2022-01-05dbnet_resnet1833.13%21.38%73.55%44.17%
2018-12-03SPCNet_TongJi & UESTC (single scale)31.65%19.72%80.12%15.98%
2017-06-28SARI_FDU_RRPN_v029.58%18.86%68.59%35.58%
2017-06-30TH-DL29.11%20.54%49.94%30.05%
2019-01-03YY AI OCR Group25.51%16.44%56.96%11.36%
2017-06-30Sensetime OCR16.23%9.03%80.09%47.00%
2019-10-14TextSnake5.35%3.07%20.52%0.64%
2017-06-30linkage-ER-Flow4.81%2.95%13.05%0.99%

Ranking Graphic

Ranking Graphic