method: AntAI-Cognition2020-04-22

Authors: Qingpei Guo, Yudong Liu, Pengcheng Yang, Yonggang Li, Yongtao Wang, Jingdong Chen, Wei Chu

Affiliation: Ant Group & PKU

Email: qingpei.gqp@antgroup.com

Description: We are from Ant Group & PKU. Our approach is an ensemble method with three text detection models. The text detection models mainly follow the MaskRCNN framework[1], with different backbones(ResNext101-64x4d[2], CBNet[3], ResNext101-32x32d_wsl[4]) used. GBDT[5] is trained to normalize confidence scores and select quadrilateral boxes with the highest quality from all text detection models' outputs. Multi-scale training and testing are adopted for all basic models. For the training set, we also add ICDAR19 MLT datasets, both training & validation sets are used to get the final result.

[1] He K, Gkioxari G, Dollár P, et al. Mask r-cnn[C]//Proceedings of the IEEE international conference on computer vision. 2017: 2961-2969. [2] Xie S, Girshick R, Dollár P, et al. Aggregated residual transformations for deep neural networks[C]//Proceedings of the IEEE conference on computer vision and pattern recognition. 2017: 1492-1500. [3] Liu Y, Wang Y, Wang S, et al. Cbnet: A novel composite backbone network architecture for object detection[J]. arXiv preprint arXiv:1909.03625, 2019. [4] Mahajan D, Girshick R, Ramanathan V, et al. Exploring the limits of weakly supervised pretraining[C]//Proceedings of the European Conference on Computer Vision (ECCV). 2018: 181-196. [5] Ke G, Meng Q, Finley T, et al. Lightgbm: A highly efficient gradient boosting decision tree[C]//Advances in neural information processing systems. 2017: 3146-3154.

method: TH2020-04-16

Authors: Tsinghua University and Hyundai Motor Group AIRS Company

Email: Shanyu Xiao: xiaosy19@mails.tsinghua.edu.cn

Description: We have built an end-to-end scene text spotter based on Mask R-CNN & Transformer. The ResNeXt-101 backbone and multiscale training/testing are used.

method: Sogou_OCR2019-11-08

Authors: Xudong Rao, Lulu Xu, Long Ma, Xuefeng Su

Description: An arbitrary-shaped text detection method based on Mask R-CNN, we use resnext-152 as our backbone, multi-scale training and testing are adopted to get the final results.

Ranking Table

Description Paper Source Code
DateMethodHmeanPrecisionRecallAverage Precision
2020-04-22 AntAI-Cognition84.36%85.92%82.86%78.41%
2020-04-16TH84.19%87.21%81.38%78.36%
2019-11-08Sogou_OCR83.74%87.22%80.54%76.87%
2019-08-08JDAI82.50%84.24%80.82%78.13%
2019-06-02NJU-ImagineLab82.40%83.20%81.62%77.98%
2021-03-21OSKDet81.98%85.90%78.40%75.08%
2019-05-30PMTD81.88%84.15%79.74%76.74%
2019-06-11 4Paradigm-Data-Intelligence81.07%81.85%80.30%65.57%
2019-05-08Baidu-VIS80.75%83.95%77.79%65.11%
2019-05-234Paradigm-Data-Intelligence80.62%81.80%79.47%64.81%
2019-03-23PMTD80.49%83.04%78.09%74.19%
2019-11-05baseline_maskrcnn79.76%83.19%76.59%71.75%
2021-05-03NCU_MSP79.30%83.30%75.66%63.01%
2024-03-14gts79.16%86.25%73.15%71.03%
2021-03-25 NCU_MSP78.80%82.69%75.27%62.18%
2019-12-13BDN78.69%79.18%78.20%61.94%
2019-08-20juxinli78.47%82.52%74.80%71.03%
2021-11-02fpa78.43%82.45%74.78%71.00%
2021-05-03adapt77.93%79.00%76.89%60.69%
2019-03-19ccnet single scale77.55%81.30%74.14%64.94%
2021-05-17NCU_FPN77.29%77.03%77.55%59.59%
2018-10-29Amap-CVLab77.20%79.64%74.91%70.51%
2020-09-28DCLNet77.08%81.22%73.33%59.69%
2018-12-22PKU_VDIG77.07%76.01%78.16%73.09%
2019-03-29GNNets (single scale)76.90%82.75%71.83%64.55%
2018-11-15USTC-NELSLIP76.88%77.47%76.30%71.29%
2022-04-22TextBPN++(ResNet-50 with DCN)76.80%81.07%72.96%59.35%
2018-11-28CRAFT76.71%81.30%72.60%59.13%
2021-12-12a76.15%77.94%74.43%58.21%
2018-11-20Pixel-Anchor76.04%83.58%69.75%58.14%
2020-10-16Drew75.82%81.33%71.02%66.76%
2021-12-12b75.82%77.73%74.00%57.54%
2020-10-21gccnet-ensemble75.47%76.95%74.05%68.39%
2021-03-03NCU_MSP_light75.44%79.11%72.09%57.13%
2023-05-22DeepSolo++ (ResNet-50)75.00%83.11%68.34%65.70%
2018-05-18PSENet_NJU_ImagineLab (single-scale)74.94%78.55%71.65%56.39%
2018-08-23Sogou_MM74.75%76.63%72.96%67.72%
2018-12-04 SPCNet_TongJi & UESTC (multi scale)74.29%77.17%71.63%55.07%
2020-12-08cascade74.16%81.69%67.91%64.79%
2017-11-09EAST++73.88%78.90%69.45%56.21%
2019-07-15stela73.72%78.67%69.35%64.14%
2019-01-08ALGCD_CP73.18%76.52%70.12%56.41%
2018-03-12ATL Cangjie OCR73.04%75.47%70.76%65.19%
2022-04-11TextBPN++(ResNet-50)72.81%79.98%66.81%53.43%
2018-01-22FOTS_v272.40%79.67%66.35%58.38%
2021-12-31TextPMs71.97%78.79%66.23%52.35%
2020-12-08corner71.78%77.97%66.49%62.88%
2023-12-17mlt_ch_0371.16%77.76%65.58%51.19%
2018-12-13AutoCV70.69%69.22%72.24%63.15%
2018-12-02Shape-Aware Based Scene Text Detector (single scale)70.24%74.22%66.66%49.50%
2024-04-02FPDIoU68.82%85.08%57.79%49.25%
2018-12-03SPCNet_TongJi & UESTC (single scale)68.08%68.13%68.02%46.20%
2018-12-05EPTN-SJTU67.69%73.30%62.87%49.91%
2019-05-30Thesis-SE66.83%72.60%61.92%47.65%
2019-09-18mask RCNN Augment+65.26%78.19%56.01%51.05%
2017-06-28SCUT_DLVClab163.21%76.76%53.73%48.24%
2017-06-29SARI_FDU_RRPN_v162.25%68.90%56.77%51.75%
2022-01-05dbnet_resnet1861.24%64.47%58.31%49.30%
2017-06-28SARI_FDU_RRPN_v059.65%65.01%55.12%48.79%
2017-06-30Sensetime OCR57.74%48.74%70.83%60.84%
2019-01-03YY AI OCR Group51.96%60.99%45.26%29.73%
2017-06-30TH-DL43.38%62.62%33.18%29.50%
2017-06-30linkage-ER-Flow29.78%36.84%24.99%13.62%
2019-10-14TextSnake17.52%20.40%15.36%3.13%

Ranking Graphic

Ranking Graphic