Results - ICDAR2017 Competition on Multi-lingual scene text detection and script identification

method: AntAI-Cognition2020-04-22

Authors: Qingpei Guo, Yudong Liu, Pengcheng Yang, Yonggang Li, Yongtao Wang, Jingdong Chen, Wei Chu

Affiliation: Ant Group & PKU

Description: We are from Ant Group & PKU. Our approach is an ensemble method with three text detection models. The text detection models mainly follow the MaskRCNN framework[1], with different backbones(ResNext101-64x4d[2], CBNet[3], ResNext101-32x32d_wsl[4]) used. GBDT[5] is trained to normalize confidence scores and select quadrilateral boxes with the highest quality from all text detection models' outputs. Multi-scale training and testing are adopted for all basic models. For the training set, we also add ICDAR19 MLT datasets, both training & validation sets are used to get the final result.

[1] He K, Gkioxari G, Dollár P, et al. Mask r-cnn[C]//Proceedings of the IEEE international conference on computer vision. 2017: 2961-2969. [2] Xie S, Girshick R, Dollár P, et al. Aggregated residual transformations for deep neural networks[C]//Proceedings of the IEEE conference on computer vision and pattern recognition. 2017: 1492-1500. [3] Liu Y, Wang Y, Wang S, et al. Cbnet: A novel composite backbone network architecture for object detection[J]. arXiv preprint arXiv:1909.03625, 2019. [4] Mahajan D, Girshick R, Ramanathan V, et al. Exploring the limits of weakly supervised pretraining[C]//Proceedings of the European Conference on Computer Vision (ECCV). 2018: 181-196. [5] Ke G, Meng Q, Finley T, et al. Lightgbm: A highly efficient gradient boosting decision tree[C]//Advances in neural information processing systems. 2017: 3146-3154.

method: TH2020-04-16

Authors: Tsinghua University and Hyundai Motor Group AIRS Company

Email: Shanyu Xiao: xiaosy19@mails.tsinghua.edu.cn

Description: We have built an end-to-end scene text spotter based on Mask R-CNN & Transformer. The ResNeXt-101 backbone and multiscale training/testing are used.

method: Sogou_OCR2019-11-08

Authors: Xudong Rao, Lulu Xu, Long Ma, Xuefeng Su

Description: An arbitrary-shaped text detection method based on Mask R-CNN, we use resnext-152 as our backbone, multi-scale training and testing are adopted to get the final results.

Ranking Table

Description Paper Source Code

Date	Method	Hmean	Precision	Recall	Average Precision
2020-04-22	AntAI-Cognition	84.36%	85.92%	82.86%	78.41%
2020-04-16	TH	84.19%	87.21%	81.38%	78.36%
2019-11-08	Sogou_OCR	83.74%	87.22%	80.54%	76.87%
2019-08-08	JDAI	82.50%	84.24%	80.82%	78.13%
2019-06-02	NJU-ImagineLab	82.40%	83.20%	81.62%	77.98%
2021-03-21	OSKDet	81.98%	85.90%	78.40%	75.08%
2019-05-30	PMTD	81.88%	84.15%	79.74%	76.74%
2019-06-11	4Paradigm-Data-Intelligence	81.07%	81.85%	80.30%	65.57%
2019-05-08	Baidu-VIS	80.75%	83.95%	77.79%	65.11%
2019-05-23	4Paradigm-Data-Intelligence	80.62%	81.80%	79.47%	64.81%
2019-03-23	PMTD	80.49%	83.04%	78.09%	74.19%
2019-11-05	baseline_maskrcnn	79.76%	83.19%	76.59%	71.75%
2021-05-03	NCU_MSP	79.30%	83.30%	75.66%	63.01%
2024-03-14	gts	79.16%	86.25%	73.15%	71.03%
2021-03-25	NCU_MSP	78.80%	82.69%	75.27%	62.18%
2019-12-13	BDN	78.69%	79.18%	78.20%	61.94%
2019-08-20	juxinli	78.47%	82.52%	74.80%	71.03%
2021-11-02	fpa	78.43%	82.45%	74.78%	71.00%
2021-05-03	adapt	77.93%	79.00%	76.89%	60.69%
2019-03-19	ccnet single scale	77.55%	81.30%	74.14%	64.94%
2021-05-17	NCU_FPN	77.29%	77.03%	77.55%	59.59%
2018-10-29	Amap-CVLab	77.20%	79.64%	74.91%	70.51%
2020-09-28	DCLNet	77.08%	81.22%	73.33%	59.69%
2018-12-22	PKU_VDIG	77.07%	76.01%	78.16%	73.09%
2019-03-29	GNNets (single scale)	76.90%	82.75%	71.83%	64.55%
2018-11-15	USTC-NELSLIP	76.88%	77.47%	76.30%	71.29%
2022-04-22	TextBPN++(ResNet-50 with DCN)	76.80%	81.07%	72.96%	59.35%
2018-11-28	CRAFT	76.71%	81.30%	72.60%	59.13%
2021-12-12	a	76.15%	77.94%	74.43%	58.21%
2018-11-20	Pixel-Anchor	76.04%	83.58%	69.75%	58.14%
2020-10-16	Drew	75.82%	81.33%	71.02%	66.76%
2021-12-12	b	75.82%	77.73%	74.00%	57.54%
2020-10-21	gccnet-ensemble	75.47%	76.95%	74.05%	68.39%
2021-03-03	NCU_MSP_light	75.44%	79.11%	72.09%	57.13%
2023-05-22	DeepSolo++ (ResNet-50)	75.00%	83.11%	68.34%	65.70%
2018-05-18	PSENet_NJU_ImagineLab (single-scale)	74.94%	78.55%	71.65%	56.39%
2018-08-23	Sogou_MM	74.75%	76.63%	72.96%	67.72%
2018-12-04	SPCNet_TongJi & UESTC (multi scale)	74.29%	77.17%	71.63%	55.07%
2020-12-08	cascade	74.16%	81.69%	67.91%	64.79%
2017-11-09	EAST++	73.88%	78.90%	69.45%	56.21%
2019-07-15	stela	73.72%	78.67%	69.35%	64.14%
2019-01-08	ALGCD_CP	73.18%	76.52%	70.12%	56.41%
2018-03-12	ATL Cangjie OCR	73.04%	75.47%	70.76%	65.19%
2022-04-11	TextBPN++(ResNet-50)	72.81%	79.98%	66.81%	53.43%
2018-01-22	FOTS_v2	72.40%	79.67%	66.35%	58.38%
2021-12-31	TextPMs	71.97%	78.79%	66.23%	52.35%
2020-12-08	corner	71.78%	77.97%	66.49%	62.88%
2023-12-17	mlt_ch_03	71.16%	77.76%	65.58%	51.19%
2018-12-13	AutoCV	70.69%	69.22%	72.24%	63.15%
2018-12-02	Shape-Aware Based Scene Text Detector (single scale)	70.24%	74.22%	66.66%	49.50%
2024-04-02	FPDIoU	68.82%	85.08%	57.79%	49.25%
2018-12-03	SPCNet_TongJi & UESTC (single scale)	68.08%	68.13%	68.02%	46.20%
2018-12-05	EPTN-SJTU	67.69%	73.30%	62.87%	49.91%
2019-05-30	Thesis-SE	66.83%	72.60%	61.92%	47.65%
2019-09-18	mask RCNN Augment+	65.26%	78.19%	56.01%	51.05%
2017-06-28	SCUT_DLVClab1	63.21%	76.76%	53.73%	48.24%
2017-06-29	SARI_FDU_RRPN_v1	62.25%	68.90%	56.77%	51.75%
2022-01-05	dbnet_resnet18	61.24%	64.47%	58.31%	49.30%
2017-06-28	SARI_FDU_RRPN_v0	59.65%	65.01%	55.12%	48.79%
2017-06-30	Sensetime OCR	57.74%	48.74%	70.83%	60.84%
2019-01-03	YY AI OCR Group	51.96%	60.99%	45.26%	29.73%
2017-06-30	TH-DL	43.38%	62.62%	33.18%	29.50%
2017-06-30	linkage-ER-Flow	29.78%	36.84%	24.99%	13.62%
2019-10-14	TextSnake	17.52%	20.40%	15.36%	3.13%

Inactive evaluations

method: AntAI-Cognition2020-04-22

method: TH2020-04-16

method: Sogou_OCR2019-11-08

Ranking Table

Ranking Graphic

Ranking Graphic