method: TH2020-04-16

Authors: Tsinghua University and Hyundai Motor Group AIRS Company

Email: Shanyu Xiao: xiaosy19@mails.tsinghua.edu.cn

Description: We have built an end-to-end scene text spotter based on Mask R-CNN & Transformer. The ResNeXt-101 backbone and multiscale training/testing are used.

method: Sogou_OCR2019-11-08

Authors: Xudong Rao, Lulu Xu, Long Ma, Xuefeng Su

Description: An arbitrary-shaped text detection method based on Mask R-CNN, we use resnext-152 as our backbone, multi-scale training and testing are adopted to get the final results.

method: AntAI-Cognition2020-04-22

Authors: Qingpei Guo, Yudong Liu, Pengcheng Yang, Yonggang Li, Yongtao Wang, Jingdong Chen, Wei Chu

Affiliation: Ant Group & PKU

Email: qingpei.gqp@antgroup.com

Description: We are from Ant Group & PKU. Our approach is an ensemble method with three text detection models. The text detection models mainly follow the MaskRCNN framework[1], with different backbones(ResNext101-64x4d[2], CBNet[3], ResNext101-32x32d_wsl[4]) used. GBDT[5] is trained to normalize confidence scores and select quadrilateral boxes with the highest quality from all text detection models' outputs. Multi-scale training and testing are adopted for all basic models. For the training set, we also add ICDAR19 MLT datasets, both training & validation sets are used to get the final result.

[1] He K, Gkioxari G, Dollár P, et al. Mask r-cnn[C]//Proceedings of the IEEE international conference on computer vision. 2017: 2961-2969. [2] Xie S, Girshick R, Dollár P, et al. Aggregated residual transformations for deep neural networks[C]//Proceedings of the IEEE conference on computer vision and pattern recognition. 2017: 1492-1500. [3] Liu Y, Wang Y, Wang S, et al. Cbnet: A novel composite backbone network architecture for object detection[J]. arXiv preprint arXiv:1909.03625, 2019. [4] Mahajan D, Girshick R, Ramanathan V, et al. Exploring the limits of weakly supervised pretraining[C]//Proceedings of the European Conference on Computer Vision (ECCV). 2018: 181-196. [5] Ke G, Meng Q, Finley T, et al. Lightgbm: A highly efficient gradient boosting decision tree[C]//Advances in neural information processing systems. 2017: 3146-3154.

Ranking Table

Description Paper Source Code
DateMethodHmeanPrecisionRecallAverage Precision
2020-04-16TH70.94%62.53%81.97%67.52%
2019-11-08Sogou_OCR69.37%60.97%80.46%65.51%
2020-04-22 AntAI-Cognition68.30%59.02%81.04%56.02%
2021-03-21OSKDet67.10%59.52%76.89%63.68%
2019-05-30PMTD64.77%55.26%78.23%63.79%
2019-08-08JDAI64.38%55.33%76.96%61.58%
2019-05-08Baidu-VIS64.31%55.42%76.59%41.52%
2023-05-22DeepSolo++ (ResNet-50)63.06%57.33%70.07%56.41%
2019-03-23PMTD63.05%53.85%76.05%61.26%
2019-11-05baseline_maskrcnn62.72%53.14%76.52%59.67%
2024-03-14gts62.65%58.22%67.80%54.49%
2020-09-28DCLNet62.38%53.15%75.48%40.36%
2019-08-20juxinli62.26%53.63%74.20%59.85%
2019-06-11 4Paradigm-Data-Intelligence62.20%50.86%80.06%39.66%
2021-11-02fpa62.15%53.49%74.17%59.74%
2021-05-03NCU_MSP61.95%54.25%72.20%39.75%
2022-04-22TextBPN++(ResNet-50 with DCN)61.79%51.90%76.32%40.10%
2019-06-02NJU-ImagineLab61.30%51.50%75.70%58.97%
2019-05-234Paradigm-Data-Intelligence61.04%50.33%77.53%37.92%
2021-03-25 NCU_MSP60.86%52.74%71.93%38.40%
2021-12-12a58.54%47.82%75.46%36.26%
2022-04-11TextBPN++(ResNet-50)58.53%50.16%70.25%35.89%
2021-12-31TextPMs57.74%48.46%71.44%35.14%
2020-12-08cascade57.71%50.22%67.83%50.04%
2019-03-19ccnet single scale57.62%49.65%68.63%47.01%
2021-12-12b57.40%46.64%74.62%34.91%
2021-05-03adapt56.87%46.75%72.57%34.16%
2020-10-16Drew56.74%48.75%67.86%51.43%
2018-11-20Pixel-Anchor56.66%50.77%64.09%34.21%
2019-03-29GNNets (single scale)56.54%49.30%66.29%44.67%
2019-12-13BDN56.27%44.91%75.30%32.80%
2018-10-29Amap-CVLab56.25%47.01%70.00%49.59%
2018-11-15USTC-NELSLIP55.58%43.95%75.59%60.29%
2018-01-22FOTS_v255.33%47.58%66.08%46.29%
2018-11-28CRAFT55.23%48.07%64.89%30.47%
2021-03-03NCU_MSP_light55.10%45.57%69.64%32.14%
2023-12-17mlt_ch_0354.84%45.99%67.91%31.73%
2018-05-18PSENet_NJU_ImagineLab (single-scale)54.57%45.44%68.28%30.21%
2021-05-17NCU_FPN53.91%42.62%73.33%31.40%
2018-12-22PKU_VDIG53.84%41.91%75.29%54.46%
2019-07-15stela53.32%45.31%64.76%45.50%
2018-03-12ATL Cangjie OCR53.18%42.20%71.88%56.17%
2020-12-08corner52.36%43.72%65.26%46.69%
2018-12-04 SPCNet_TongJi & UESTC (multi scale)52.06%43.26%65.38%27.39%
2017-06-28SCUT_DLVClab151.27%45.80%58.24%38.59%
2020-10-21gccnet-ensemble51.26%41.40%67.26%42.97%
2017-11-09EAST++51.07%43.27%62.30%29.75%
2019-01-08ALGCD_CP50.60%41.17%65.65%27.30%
2018-08-23Sogou_MM50.35%40.40%66.81%44.72%
2019-09-18mask RCNN Augment+50.23%45.24%56.46%36.66%
2024-04-02FPDIoU50.14%56.15%45.29%26.15%
2018-12-02Shape-Aware Based Scene Text Detector (single scale)49.96%40.58%65.00%25.68%
2019-05-30Thesis-SE45.58%37.34%58.49%23.54%
2018-12-13AutoCV45.01%33.36%69.15%41.95%
2018-12-03SPCNet_TongJi & UESTC (single scale)44.38%32.88%68.23%21.78%
2018-12-05EPTN-SJTU43.88%36.03%56.10%21.90%
2017-06-29SARI_FDU_RRPN_v139.40%32.19%50.76%24.45%
2022-01-05dbnet_resnet1838.87%29.97%55.30%29.60%
2017-06-28SARI_FDU_RRPN_v038.55%28.94%57.72%30.92%
2017-06-30TH-DL32.73%30.16%35.78%20.68%
2019-01-03YY AI OCR Group31.43%25.24%41.65%13.07%
2017-06-30Sensetime OCR27.39%17.11%68.68%44.78%
2017-06-30linkage-ER-Flow14.40%10.71%21.99%4.02%
2019-10-14TextSnake6.15%4.19%11.55%0.50%

Ranking Graphic

Ranking Graphic