method: AntAI-Cognition2020-04-22

Authors: Qingpei Guo, Yudong Liu, Pengcheng Yang, Yonggang Li, Yongtao Wang, Jingdong Chen, Wei Chu

Affiliation: Ant Group & PKU

Email: qingpei.gqp@antgroup.com

Description: We are from Ant Group & PKU. Our approach is an ensemble method with three text detection models. The text detection models mainly follow the MaskRCNN framework[1], with different backbones(ResNext101-64x4d[2], CBNet[3], ResNext101-32x32d_wsl[4]) used. GBDT[5] is trained to normalize confidence scores and select quadrilateral boxes with the highest quality from all text detection models' outputs. Multi-scale training and testing are adopted for all basic models. For the training set, we also add ICDAR19 MLT datasets, both training & validation sets are used to get the final result.

[1] He K, Gkioxari G, Dollár P, et al. Mask r-cnn[C]//Proceedings of the IEEE international conference on computer vision. 2017: 2961-2969. [2] Xie S, Girshick R, Dollár P, et al. Aggregated residual transformations for deep neural networks[C]//Proceedings of the IEEE conference on computer vision and pattern recognition. 2017: 1492-1500. [3] Liu Y, Wang Y, Wang S, et al. Cbnet: A novel composite backbone network architecture for object detection[J]. arXiv preprint arXiv:1909.03625, 2019. [4] Mahajan D, Girshick R, Ramanathan V, et al. Exploring the limits of weakly supervised pretraining[C]//Proceedings of the European Conference on Computer Vision (ECCV). 2018: 181-196. [5] Ke G, Meng Q, Finley T, et al. Lightgbm: A highly efficient gradient boosting decision tree[C]//Advances in neural information processing systems. 2017: 3146-3154.

method: TH2020-04-16

Authors: Tsinghua University and Hyundai Motor Group AIRS Company

Email: Shanyu Xiao: xiaosy19@mails.tsinghua.edu.cn

Description: We have built an end-to-end scene text spotter based on Mask R-CNN & Transformer. The ResNeXt-101 backbone and multiscale training/testing are used.

method: Sogou_OCR2019-11-08

Authors: Xudong Rao, Lulu Xu, Long Ma, Xuefeng Su

Description: An arbitrary-shaped text detection method based on Mask R-CNN, we use resnext-152 as our backbone, multi-scale training and testing are adopted to get the final results.

Ranking Table

Description Paper Source Code
DateMethodHmeanPrecisionRecallAverage Precision
2020-04-22 AntAI-Cognition84.45%88.55%80.72%77.19%
2020-04-16TH84.36%89.66%79.65%77.33%
2019-11-08Sogou_OCR83.93%89.95%78.66%75.91%
2019-08-08JDAI82.82%87.83%78.35%76.15%
2019-06-02NJU-ImagineLab82.74%86.62%79.19%76.32%
2019-05-30PMTD82.12%87.05%77.72%75.22%
2019-06-11 4Paradigm-Data-Intelligence81.60%85.27%78.22%66.62%
2021-03-21OSKDet81.43%87.66%76.02%73.09%
2019-05-234Paradigm-Data-Intelligence80.99%85.33%77.08%65.66%
2019-05-08Baidu-VIS80.65%86.31%75.68%65.15%
2019-11-05baseline_maskrcnn80.24%86.62%74.74%71.18%
2019-03-23PMTD80.18%85.20%75.72%72.28%
2019-12-13BDN79.47%82.75%76.44%63.08%
2019-08-20juxinli78.51%85.13%72.84%69.66%
2021-11-02fpa78.48%85.09%72.82%69.62%
2024-03-14gts78.32%89.27%69.76%68.01%
2021-05-03NCU_MSP78.23%84.54%72.79%61.57%
2021-03-25 NCU_MSP77.93%84.23%72.51%61.07%
2021-05-17NCU_FPN77.49%80.25%74.90%59.99%
2022-04-22TextBPN++(ResNet-50 with DCN)77.48%83.74%72.10%60.47%
2021-05-03adapt77.39%80.96%74.13%60.05%
2018-12-22PKU_VDIG77.29%78.73%75.90%71.41%
2018-11-15USTC-NELSLIP76.85%79.33%74.51%69.04%
2021-12-12a76.38%80.42%72.73%58.88%
2021-12-12b76.36%80.87%72.32%58.76%
2020-09-28DCLNet76.29%81.93%71.37%58.77%
2018-10-29Amap-CVLab76.08%80.91%71.79%67.72%
2021-03-03NCU_MSP_light75.82%82.54%70.12%57.88%
2019-03-19ccnet single scale75.77%81.27%70.97%61.97%
2023-05-22DeepSolo++ (ResNet-50)75.55%86.22%67.22%65.06%
2018-08-23Sogou_MM75.13%80.35%70.56%66.33%
2020-10-21gccnet-ensemble75.13%79.25%71.41%66.18%
2020-10-16Drew75.09%83.41%68.29%64.53%
2018-11-20Pixel-Anchor74.79%84.24%67.24%56.83%
2020-12-08cascade74.77%84.68%66.94%64.23%
2019-03-29GNNets (single scale)74.55%81.23%68.89%62.05%
2018-12-04 SPCNet_TongJi & UESTC (multi scale)74.13%80.61%68.62%55.20%
2018-11-28CRAFT74.03%80.82%68.30%55.17%
2019-01-08ALGCD_CP73.84%80.84%67.96%57.13%
2018-03-12ATL Cangjie OCR73.52%78.88%68.84%64.30%
2018-01-22FOTS_v273.31%83.06%65.61%59.93%
2017-11-09EAST++72.86%80.42%66.61%54.94%
2021-12-31TextPMs72.49%80.95%65.64%53.30%
2020-12-08corner72.45%81.43%65.25%62.10%
2018-05-18PSENet_NJU_ImagineLab (single-scale)72.45%77.01%68.40%52.51%
2023-12-17mlt_ch_0372.37%81.60%65.02%53.19%
2022-04-11TextBPN++(ResNet-50)72.33%80.49%65.67%53.05%
2019-07-15stela71.50%78.68%65.52%60.26%
2018-12-13AutoCV71.41%72.40%70.46%62.63%
2018-12-02Shape-Aware Based Scene Text Detector (single scale)70.39%76.55%65.16%49.79%
2018-12-03SPCNet_TongJi & UESTC (single scale)70.00%73.40%66.89%49.02%
2018-12-05EPTN-SJTU67.58%75.71%61.02%49.59%
2019-05-30Thesis-SE67.22%75.68%60.47%47.30%
2024-04-02FPDIoU66.04%84.19%54.33%45.86%
2019-09-18mask RCNN Augment+66.02%80.80%55.82%51.50%
2017-06-28SCUT_DLVClab164.96%80.28%54.54%50.34%
2017-06-30Sensetime OCR62.56%56.93%69.43%61.24%
2017-06-29SARI_FDU_RRPN_v162.37%71.17%55.50%50.33%
2017-06-28SARI_FDU_RRPN_v060.66%67.07%55.37%48.76%
2022-01-05dbnet_resnet1860.60%64.62%57.05%47.71%
2019-01-03YY AI OCR Group52.60%64.77%44.28%29.67%
2017-06-30TH-DL45.97%67.75%34.78%30.88%
2017-06-30linkage-ER-Flow32.49%44.48%25.59%15.47%
2019-10-14TextSnake21.31%28.99%16.85%4.89%

Ranking Graphic

Ranking Graphic