Results - ICDAR2017 Competition on Multi-lingual scene text detection and script identification

method: TH2020-04-16

Authors: Tsinghua University and Hyundai Motor Group AIRS Company

Email: Shanyu Xiao: xiaosy19@mails.tsinghua.edu.cn

Description: We have built an end-to-end scene text spotter based on Mask R-CNN & Transformer. The ResNeXt-101 backbone and multiscale training/testing are used.

method: Sogou_OCR2019-11-08

Authors: Xudong Rao, Lulu Xu, Long Ma, Xuefeng Su

Description: An arbitrary-shaped text detection method based on Mask R-CNN, we use resnext-152 as our backbone, multi-scale training and testing are adopted to get the final results.

method: AntAI-Cognition2020-04-22

Authors: Qingpei Guo, Yudong Liu, Pengcheng Yang, Yonggang Li, Yongtao Wang, Jingdong Chen, Wei Chu

Affiliation: Ant Group & PKU

Email: qingpei.gqp@antgroup.com

Description: We are from Ant Group & PKU. Our approach is an ensemble method with three text detection models. The text detection models mainly follow the MaskRCNN framework[1], with different backbones(ResNext101-64x4d[2], CBNet[3], ResNext101-32x32d_wsl[4]) used. GBDT[5] is trained to normalize confidence scores and select quadrilateral boxes with the highest quality from all text detection models' outputs. Multi-scale training and testing are adopted for all basic models. For the training set, we also add ICDAR19 MLT datasets, both training & validation sets are used to get the final result.

[1] He K, Gkioxari G, Dollár P, et al. Mask r-cnn[C]//Proceedings of the IEEE international conference on computer vision. 2017: 2961-2969. [2] Xie S, Girshick R, Dollár P, et al. Aggregated residual transformations for deep neural networks[C]//Proceedings of the IEEE conference on computer vision and pattern recognition. 2017: 1492-1500. [3] Liu Y, Wang Y, Wang S, et al. Cbnet: A novel composite backbone network architecture for object detection[J]. arXiv preprint arXiv:1909.03625, 2019. [4] Mahajan D, Girshick R, Ramanathan V, et al. Exploring the limits of weakly supervised pretraining[C]//Proceedings of the European Conference on Computer Vision (ECCV). 2018: 181-196. [5] Ke G, Meng Q, Finley T, et al. Lightgbm: A highly efficient gradient boosting decision tree[C]//Advances in neural information processing systems. 2017: 3146-3154.

Ranking Table

Description Paper Source Code

Date	Method	Hmean	Precision	Recall	Average Precision
2020-04-16	TH	59.17%	44.09%	89.91%	79.49%
2019-11-08	Sogou_OCR	57.05%	42.33%	87.46%	68.55%
2020-04-22	AntAI-Cognition	56.68%	41.26%	90.51%	74.90%
2024-04-02	FPDIoU	56.62%	46.96%	71.28%	32.79%
2021-03-21	OSKDet	56.12%	42.00%	84.54%	70.22%
2023-05-22	DeepSolo++ (ResNet-50)	55.30%	41.29%	83.70%	74.98%
2019-03-29	GNNets (single scale)	52.80%	39.41%	79.97%	47.89%
2018-11-20	Pixel-Anchor	52.70%	39.01%	81.16%	42.19%
2019-08-08	JDAI	52.31%	37.11%	88.60%	73.66%
2019-03-19	ccnet single scale	52.16%	37.49%	85.71%	68.08%
2024-03-14	gts	52.12%	39.47%	76.72%	64.64%
2020-09-28	DCLNet	51.90%	37.60%	83.76%	32.20%
2019-05-30	PMTD	51.69%	36.90%	86.29%	75.91%
2019-05-08	Baidu-VIS	51.66%	37.18%	84.62%	30.96%
2021-05-03	NCU_MSP	51.65%	37.83%	81.37%	30.93%
2019-03-23	PMTD	51.22%	36.71%	84.73%	69.90%
2019-06-02	NJU-ImagineLab	50.41%	35.04%	89.81%	75.59%
2021-03-25	NCU_MSP	50.15%	36.25%	81.33%	29.96%
2019-08-20	juxinli	49.70%	35.76%	81.45%	66.44%
2021-11-02	fpa	49.54%	35.60%	81.39%	66.31%
2022-04-22	TextBPN++(ResNet-50 with DCN)	49.36%	34.92%	84.17%	30.76%
2019-11-05	baseline_maskrcnn	48.55%	34.43%	82.30%	60.07%
2020-12-08	cascade	48.00%	34.38%	79.50%	65.20%
2019-06-11	4Paradigm-Data-Intelligence	47.35%	32.46%	87.48%	28.06%
2019-05-23	4Paradigm-Data-Intelligence	47.34%	32.51%	87.05%	28.48%
2018-01-22	FOTS_v2	47.05%	33.05%	81.66%	65.40%
2020-10-16	Drew	46.94%	33.24%	79.85%	63.57%
2022-04-11	TextBPN++(ResNet-50)	46.72%	33.91%	75.09%	26.78%
2018-10-29	Amap-CVLab	46.39%	32.63%	80.22%	66.26%
2021-12-12	a	46.18%	31.67%	85.26%	27.63%
2019-07-15	stela	46.09%	32.67%	78.26%	57.95%
2018-05-18	PSENet_NJU_ImagineLab (single-scale)	45.98%	32.86%	76.53%	25.15%
2021-05-03	adapt	45.10%	30.97%	82.98%	26.45%
2021-12-12	b	44.99%	30.58%	85.08%	26.55%
2017-11-09	EAST++	44.21%	30.80%	78.34%	27.79%
2019-09-18	mask RCNN Augment+	43.89%	31.61%	71.78%	48.67%
2019-12-13	BDN	43.62%	29.16%	86.52%	25.00%
2021-12-31	TextPMs	43.52%	30.59%	75.42%	23.82%
2018-11-28	CRAFT	43.46%	31.91%	68.09%	22.40%
2018-11-15	USTC-NELSLIP	43.44%	29.18%	84.97%	69.13%
2017-06-28	SCUT_DLVClab1	43.07%	31.35%	68.79%	47.05%
2020-10-21	gccnet-ensemble	42.81%	28.73%	83.96%	58.86%
2021-03-03	NCU_MSP_light	42.44%	28.95%	79.50%	23.49%
2018-12-04	SPCNet_TongJi & UESTC (multi scale)	41.71%	28.26%	79.62%	22.37%
2020-12-08	corner	41.36%	28.55%	75.01%	54.07%
2018-08-23	Sogou_MM	41.00%	27.07%	84.50%	67.28%
2018-12-22	PKU_VDIG	40.56%	26.79%	83.53%	66.55%
2021-05-17	NCU_FPN	40.42%	26.75%	82.67%	22.65%
2023-12-17	mlt_ch_03	40.26%	27.92%	72.17%	21.17%
2018-03-12	ATL Cangjie OCR	40.05%	26.58%	81.19%	64.38%
2019-01-08	ALGCD_CP	39.92%	26.57%	80.24%	24.09%
2018-12-02	Shape-Aware Based Scene Text Detector (single scale)	39.66%	26.77%	76.49%	20.54%
2018-12-05	EPTN-SJTU	38.53%	25.66%	77.32%	21.78%
2019-05-30	Thesis-SE	37.74%	25.09%	76.08%	21.23%
2017-06-29	SARI_FDU_RRPN_v1	34.74%	22.99%	71.06%	51.15%
2018-12-13	AutoCV	33.15%	20.79%	81.76%	42.92%
2022-01-05	dbnet_resnet18	33.13%	21.38%	73.55%	44.17%
2018-12-03	SPCNet_TongJi & UESTC (single scale)	31.65%	19.72%	80.12%	15.98%
2017-06-28	SARI_FDU_RRPN_v0	29.58%	18.86%	68.59%	35.58%
2017-06-30	TH-DL	29.11%	20.54%	49.94%	30.05%
2019-01-03	YY AI OCR Group	25.51%	16.44%	56.96%	11.36%
2017-06-30	Sensetime OCR	16.23%	9.03%	80.09%	47.00%
2019-10-14	TextSnake	5.35%	3.07%	20.52%	0.64%
2017-06-30	linkage-ER-Flow	4.81%	2.95%	13.05%	0.99%

Inactive evaluations

method: TH2020-04-16

method: Sogou_OCR2019-11-08

method: AntAI-Cognition2020-04-22

Ranking Table

Ranking Graphic

Ranking Graphic