Results - ICDAR2017 Competition on Multi-lingual scene text detection and script identification

method: AntAI-Cognition2020-04-22

Authors: Qingpei Guo, Yudong Liu, Pengcheng Yang, Yonggang Li, Yongtao Wang, Jingdong Chen, Wei Chu

Affiliation: Ant Group & PKU

Description: We are from Ant Group & PKU. Our approach is an ensemble method with three text detection models. The text detection models mainly follow the MaskRCNN framework[1], with different backbones(ResNext101-64x4d[2], CBNet[3], ResNext101-32x32d_wsl[4]) used. GBDT[5] is trained to normalize confidence scores and select quadrilateral boxes with the highest quality from all text detection models' outputs. Multi-scale training and testing are adopted for all basic models. For the training set, we also add ICDAR19 MLT datasets, both training & validation sets are used to get the final result.

[1] He K, Gkioxari G, Dollár P, et al. Mask r-cnn[C]//Proceedings of the IEEE international conference on computer vision. 2017: 2961-2969. [2] Xie S, Girshick R, Dollár P, et al. Aggregated residual transformations for deep neural networks[C]//Proceedings of the IEEE conference on computer vision and pattern recognition. 2017: 1492-1500. [3] Liu Y, Wang Y, Wang S, et al. Cbnet: A novel composite backbone network architecture for object detection[J]. arXiv preprint arXiv:1909.03625, 2019. [4] Mahajan D, Girshick R, Ramanathan V, et al. Exploring the limits of weakly supervised pretraining[C]//Proceedings of the European Conference on Computer Vision (ECCV). 2018: 181-196. [5] Ke G, Meng Q, Finley T, et al. Lightgbm: A highly efficient gradient boosting decision tree[C]//Advances in neural information processing systems. 2017: 3146-3154.

method: TH2020-04-16

Authors: Tsinghua University and Hyundai Motor Group AIRS Company

Email: Shanyu Xiao: xiaosy19@mails.tsinghua.edu.cn

Description: We have built an end-to-end scene text spotter based on Mask R-CNN & Transformer. The ResNeXt-101 backbone and multiscale training/testing are used.

method: Sogou_OCR2019-11-08

Authors: Xudong Rao, Lulu Xu, Long Ma, Xuefeng Su

Description: An arbitrary-shaped text detection method based on Mask R-CNN, we use resnext-152 as our backbone, multi-scale training and testing are adopted to get the final results.

Ranking Table

Description Paper Source Code

Date	Method	Hmean	Precision	Recall	Average Precision
2020-04-22	AntAI-Cognition	84.45%	88.55%	80.72%	77.19%
2020-04-16	TH	84.36%	89.66%	79.65%	77.33%
2019-11-08	Sogou_OCR	83.93%	89.95%	78.66%	75.91%
2019-08-08	JDAI	82.82%	87.83%	78.35%	76.15%
2019-06-02	NJU-ImagineLab	82.74%	86.62%	79.19%	76.32%
2019-05-30	PMTD	82.12%	87.05%	77.72%	75.22%
2019-06-11	4Paradigm-Data-Intelligence	81.60%	85.27%	78.22%	66.62%
2021-03-21	OSKDet	81.43%	87.66%	76.02%	73.09%
2019-05-23	4Paradigm-Data-Intelligence	80.99%	85.33%	77.08%	65.66%
2019-05-08	Baidu-VIS	80.65%	86.31%	75.68%	65.15%
2019-11-05	baseline_maskrcnn	80.24%	86.62%	74.74%	71.18%
2019-03-23	PMTD	80.18%	85.20%	75.72%	72.28%
2019-12-13	BDN	79.47%	82.75%	76.44%	63.08%
2019-08-20	juxinli	78.51%	85.13%	72.84%	69.66%
2021-11-02	fpa	78.48%	85.09%	72.82%	69.62%
2024-03-14	gts	78.32%	89.27%	69.76%	68.01%
2021-05-03	NCU_MSP	78.23%	84.54%	72.79%	61.57%
2021-03-25	NCU_MSP	77.93%	84.23%	72.51%	61.07%
2025-05-03	mixnet325	77.85%	84.18%	72.41%	60.90%
2021-05-17	NCU_FPN	77.49%	80.25%	74.90%	59.99%
2022-04-22	TextBPN++(ResNet-50 with DCN)	77.48%	83.74%	72.10%	60.47%
2021-05-03	adapt	77.39%	80.96%	74.13%	60.05%
2018-12-22	PKU_VDIG	77.29%	78.73%	75.90%	71.41%
2025-05-03	mixnet350	77.12%	82.43%	72.46%	59.84%
2018-11-15	USTC-NELSLIP	76.85%	79.33%	74.51%	69.04%
2021-12-12	a	76.38%	80.42%	72.73%	58.88%
2021-12-12	b	76.36%	80.87%	72.32%	58.76%
2020-09-28	DCLNet	76.29%	81.93%	71.37%	58.77%
2018-10-29	Amap-CVLab	76.08%	80.91%	71.79%	67.72%
2021-03-03	NCU_MSP_light	75.82%	82.54%	70.12%	57.88%
2019-03-19	ccnet single scale	75.77%	81.27%	70.97%	61.97%
2023-05-22	DeepSolo++ (ResNet-50)	75.55%	86.22%	67.22%	65.06%
2018-08-23	Sogou_MM	75.13%	80.35%	70.56%	66.33%
2020-10-21	gccnet-ensemble	75.13%	79.25%	71.41%	66.18%
2020-10-16	Drew	75.09%	83.41%	68.29%	64.53%
2018-11-20	Pixel-Anchor	74.79%	84.24%	67.24%	56.83%
2020-12-08	cascade	74.77%	84.68%	66.94%	64.23%
2019-03-29	GNNets (single scale)	74.55%	81.23%	68.89%	62.05%
2018-12-04	SPCNet_TongJi & UESTC (multi scale)	74.13%	80.61%	68.62%	55.20%
2018-11-28	CRAFT	74.03%	80.82%	68.30%	55.17%
2019-01-08	ALGCD_CP	73.84%	80.84%	67.96%	57.13%
2018-03-12	ATL Cangjie OCR	73.52%	78.88%	68.84%	64.30%
2018-01-22	FOTS_v2	73.31%	83.06%	65.61%	59.93%
2017-11-09	EAST++	72.86%	80.42%	66.61%	54.94%
2021-12-31	TextPMs	72.49%	80.95%	65.64%	53.30%
2020-12-08	corner	72.45%	81.43%	65.25%	62.10%
2018-05-18	PSENet_NJU_ImagineLab (single-scale)	72.45%	77.01%	68.40%	52.51%
2023-12-17	mlt_ch_03	72.37%	81.60%	65.02%	53.19%
2022-04-11	TextBPN++(ResNet-50)	72.33%	80.49%	65.67%	53.05%
2019-07-15	stela	71.50%	78.68%	65.52%	60.26%
2018-12-13	AutoCV	71.41%	72.40%	70.46%	62.63%
2018-12-02	Shape-Aware Based Scene Text Detector (single scale)	70.39%	76.55%	65.16%	49.79%
2018-12-03	SPCNet_TongJi & UESTC (single scale)	70.00%	73.40%	66.89%	49.02%
2018-12-05	EPTN-SJTU	67.58%	75.71%	61.02%	49.59%
2019-05-30	Thesis-SE	67.22%	75.68%	60.47%	47.30%
2024-04-02	FPDIoU	66.04%	84.19%	54.33%	45.86%
2019-09-18	mask RCNN Augment+	66.02%	80.80%	55.82%	51.50%
2017-06-28	SCUT_DLVClab1	64.96%	80.28%	54.54%	50.34%
2017-06-30	Sensetime OCR	62.56%	56.93%	69.43%	61.24%
2017-06-29	SARI_FDU_RRPN_v1	62.37%	71.17%	55.50%	50.33%
2017-06-28	SARI_FDU_RRPN_v0	60.66%	67.07%	55.37%	48.76%
2022-01-05	dbnet_resnet18	60.60%	64.62%	57.05%	47.71%
2019-01-03	YY AI OCR Group	52.60%	64.77%	44.28%	29.67%
2017-06-30	TH-DL	45.97%	67.75%	34.78%	30.88%
2017-06-30	linkage-ER-Flow	32.49%	44.48%	25.59%	15.47%
2019-10-14	TextSnake	21.31%	28.99%	16.85%	4.89%

Inactive evaluations

method: AntAI-Cognition2020-04-22

method: TH2020-04-16

method: Sogou_OCR2019-11-08

Ranking Table

Ranking Graphic

Ranking Graphic