Results - ICDAR2017 Competition on Multi-lingual scene text detection and script identification

method: TH2020-04-16

Authors: Tsinghua University and Hyundai Motor Group AIRS Company

Email: Shanyu Xiao: xiaosy19@mails.tsinghua.edu.cn

Description: We have built an end-to-end scene text spotter based on Mask R-CNN & Transformer. The ResNeXt-101 backbone and multiscale training/testing are used.

method: Sogou_OCR2019-11-08

Authors: Xudong Rao, Lulu Xu, Long Ma, Xuefeng Su

Description: An arbitrary-shaped text detection method based on Mask R-CNN, we use resnext-152 as our backbone, multi-scale training and testing are adopted to get the final results.

method: AntAI-Cognition2020-04-22

Authors: Qingpei Guo, Yudong Liu, Pengcheng Yang, Yonggang Li, Yongtao Wang, Jingdong Chen, Wei Chu

Affiliation: Ant Group & PKU

Email: qingpei.gqp@antgroup.com

Description: We are from Ant Group & PKU. Our approach is an ensemble method with three text detection models. The text detection models mainly follow the MaskRCNN framework[1], with different backbones(ResNext101-64x4d[2], CBNet[3], ResNext101-32x32d_wsl[4]) used. GBDT[5] is trained to normalize confidence scores and select quadrilateral boxes with the highest quality from all text detection models' outputs. Multi-scale training and testing are adopted for all basic models. For the training set, we also add ICDAR19 MLT datasets, both training & validation sets are used to get the final result.

[1] He K, Gkioxari G, Dollár P, et al. Mask r-cnn[C]//Proceedings of the IEEE international conference on computer vision. 2017: 2961-2969. [2] Xie S, Girshick R, Dollár P, et al. Aggregated residual transformations for deep neural networks[C]//Proceedings of the IEEE conference on computer vision and pattern recognition. 2017: 1492-1500. [3] Liu Y, Wang Y, Wang S, et al. Cbnet: A novel composite backbone network architecture for object detection[J]. arXiv preprint arXiv:1909.03625, 2019. [4] Mahajan D, Girshick R, Ramanathan V, et al. Exploring the limits of weakly supervised pretraining[C]//Proceedings of the European Conference on Computer Vision (ECCV). 2018: 181-196. [5] Ke G, Meng Q, Finley T, et al. Lightgbm: A highly efficient gradient boosting decision tree[C]//Advances in neural information processing systems. 2017: 3146-3154.

Ranking Table

Description Paper Source Code

Date	Method	Hmean	Precision	Recall	Average Precision
2020-04-16	TH	70.94%	62.53%	81.97%	67.52%
2019-11-08	Sogou_OCR	69.37%	60.97%	80.46%	65.51%
2020-04-22	AntAI-Cognition	68.30%	59.02%	81.04%	56.02%
2021-03-21	OSKDet	67.10%	59.52%	76.89%	63.68%
2019-05-30	PMTD	64.77%	55.26%	78.23%	63.79%
2019-08-08	JDAI	64.38%	55.33%	76.96%	61.58%
2019-05-08	Baidu-VIS	64.31%	55.42%	76.59%	41.52%
2023-05-22	DeepSolo++ (ResNet-50)	63.06%	57.33%	70.07%	56.41%
2019-03-23	PMTD	63.05%	53.85%	76.05%	61.26%
2019-11-05	baseline_maskrcnn	62.72%	53.14%	76.52%	59.67%
2024-03-14	gts	62.65%	58.22%	67.80%	54.49%
2020-09-28	DCLNet	62.38%	53.15%	75.48%	40.36%
2019-08-20	juxinli	62.26%	53.63%	74.20%	59.85%
2019-06-11	4Paradigm-Data-Intelligence	62.20%	50.86%	80.06%	39.66%
2021-11-02	fpa	62.15%	53.49%	74.17%	59.74%
2021-05-03	NCU_MSP	61.95%	54.25%	72.20%	39.75%
2022-04-22	TextBPN++(ResNet-50 with DCN)	61.79%	51.90%	76.32%	40.10%
2019-06-02	NJU-ImagineLab	61.30%	51.50%	75.70%	58.97%
2019-05-23	4Paradigm-Data-Intelligence	61.04%	50.33%	77.53%	37.92%
2021-03-25	NCU_MSP	60.86%	52.74%	71.93%	38.40%
2021-12-12	a	58.54%	47.82%	75.46%	36.26%
2022-04-11	TextBPN++(ResNet-50)	58.53%	50.16%	70.25%	35.89%
2021-12-31	TextPMs	57.74%	48.46%	71.44%	35.14%
2020-12-08	cascade	57.71%	50.22%	67.83%	50.04%
2019-03-19	ccnet single scale	57.62%	49.65%	68.63%	47.01%
2021-12-12	b	57.40%	46.64%	74.62%	34.91%
2021-05-03	adapt	56.87%	46.75%	72.57%	34.16%
2020-10-16	Drew	56.74%	48.75%	67.86%	51.43%
2018-11-20	Pixel-Anchor	56.66%	50.77%	64.09%	34.21%
2019-03-29	GNNets (single scale)	56.54%	49.30%	66.29%	44.67%
2019-12-13	BDN	56.27%	44.91%	75.30%	32.80%
2018-10-29	Amap-CVLab	56.25%	47.01%	70.00%	49.59%
2018-11-15	USTC-NELSLIP	55.58%	43.95%	75.59%	60.29%
2018-01-22	FOTS_v2	55.33%	47.58%	66.08%	46.29%
2018-11-28	CRAFT	55.23%	48.07%	64.89%	30.47%
2021-03-03	NCU_MSP_light	55.10%	45.57%	69.64%	32.14%
2023-12-17	mlt_ch_03	54.84%	45.99%	67.91%	31.73%
2018-05-18	PSENet_NJU_ImagineLab (single-scale)	54.57%	45.44%	68.28%	30.21%
2021-05-17	NCU_FPN	53.91%	42.62%	73.33%	31.40%
2018-12-22	PKU_VDIG	53.84%	41.91%	75.29%	54.46%
2019-07-15	stela	53.32%	45.31%	64.76%	45.50%
2018-03-12	ATL Cangjie OCR	53.18%	42.20%	71.88%	56.17%
2020-12-08	corner	52.36%	43.72%	65.26%	46.69%
2018-12-04	SPCNet_TongJi & UESTC (multi scale)	52.06%	43.26%	65.38%	27.39%
2017-06-28	SCUT_DLVClab1	51.27%	45.80%	58.24%	38.59%
2020-10-21	gccnet-ensemble	51.26%	41.40%	67.26%	42.97%
2017-11-09	EAST++	51.07%	43.27%	62.30%	29.75%
2019-01-08	ALGCD_CP	50.60%	41.17%	65.65%	27.30%
2018-08-23	Sogou_MM	50.35%	40.40%	66.81%	44.72%
2019-09-18	mask RCNN Augment+	50.23%	45.24%	56.46%	36.66%
2024-04-02	FPDIoU	50.14%	56.15%	45.29%	26.15%
2018-12-02	Shape-Aware Based Scene Text Detector (single scale)	49.96%	40.58%	65.00%	25.68%
2019-05-30	Thesis-SE	45.58%	37.34%	58.49%	23.54%
2018-12-13	AutoCV	45.01%	33.36%	69.15%	41.95%
2018-12-03	SPCNet_TongJi & UESTC (single scale)	44.38%	32.88%	68.23%	21.78%
2018-12-05	EPTN-SJTU	43.88%	36.03%	56.10%	21.90%
2017-06-29	SARI_FDU_RRPN_v1	39.40%	32.19%	50.76%	24.45%
2022-01-05	dbnet_resnet18	38.87%	29.97%	55.30%	29.60%
2017-06-28	SARI_FDU_RRPN_v0	38.55%	28.94%	57.72%	30.92%
2017-06-30	TH-DL	32.73%	30.16%	35.78%	20.68%
2019-01-03	YY AI OCR Group	31.43%	25.24%	41.65%	13.07%
2017-06-30	Sensetime OCR	27.39%	17.11%	68.68%	44.78%
2017-06-30	linkage-ER-Flow	14.40%	10.71%	21.99%	4.02%
2019-10-14	TextSnake	6.15%	4.19%	11.55%	0.50%

Inactive evaluations

method: TH2020-04-16

method: Sogou_OCR2019-11-08

method: AntAI-Cognition2020-04-22

Ranking Table

Ranking Graphic

Ranking Graphic