Results - ICDAR2017 Competition on Multi-lingual scene text detection and script identification

method: TH2020-04-16

Authors: Tsinghua University and Hyundai Motor Group AIRS Company

Email: Shanyu Xiao: xiaosy19@mails.tsinghua.edu.cn

Description: We have built an end-to-end scene text spotter based on Mask R-CNN & Transformer. The ResNeXt-101 backbone and multiscale training/testing are used.

method: Sogou_OCR2019-11-08

Authors: Xudong Rao, Lulu Xu, Long Ma, Xuefeng Su

Description: An arbitrary-shaped text detection method based on Mask R-CNN, we use resnext-152 as our backbone, multi-scale training and testing are adopted to get the final results.

method: AntAI-Cognition2020-04-22

Authors: Qingpei Guo, Yudong Liu, Pengcheng Yang, Yonggang Li, Yongtao Wang, Jingdong Chen, Wei Chu

Affiliation: Ant Group & PKU

Email: qingpei.gqp@antgroup.com

Description: We are from Ant Group & PKU. Our approach is an ensemble method with three text detection models. The text detection models mainly follow the MaskRCNN framework[1], with different backbones(ResNext101-64x4d[2], CBNet[3], ResNext101-32x32d_wsl[4]) used. GBDT[5] is trained to normalize confidence scores and select quadrilateral boxes with the highest quality from all text detection models' outputs. Multi-scale training and testing are adopted for all basic models. For the training set, we also add ICDAR19 MLT datasets, both training & validation sets are used to get the final result.

[1] He K, Gkioxari G, Dollár P, et al. Mask r-cnn[C]//Proceedings of the IEEE international conference on computer vision. 2017: 2961-2969. [2] Xie S, Girshick R, Dollár P, et al. Aggregated residual transformations for deep neural networks[C]//Proceedings of the IEEE conference on computer vision and pattern recognition. 2017: 1492-1500. [3] Liu Y, Wang Y, Wang S, et al. Cbnet: A novel composite backbone network architecture for object detection[J]. arXiv preprint arXiv:1909.03625, 2019. [4] Mahajan D, Girshick R, Ramanathan V, et al. Exploring the limits of weakly supervised pretraining[C]//Proceedings of the European Conference on Computer Vision (ECCV). 2018: 181-196. [5] Ke G, Meng Q, Finley T, et al. Lightgbm: A highly efficient gradient boosting decision tree[C]//Advances in neural information processing systems. 2017: 3146-3154.

Ranking Table

Description Paper Source Code

Date	Method	Hmean	Precision	Recall	Average Precision
2020-04-16	TH	55.77%	41.01%	87.12%	73.23%
2019-11-08	Sogou_OCR	53.96%	39.23%	86.38%	68.24%
2020-04-22	AntAI-Cognition	52.67%	37.81%	86.78%	68.83%
2019-08-08	JDAI	49.37%	34.52%	86.65%	74.80%
2019-05-30	PMTD	48.30%	33.75%	84.95%	72.61%
2019-06-02	NJU-ImagineLab	47.22%	32.46%	86.61%	68.80%
2019-03-29	GNNets (single scale)	47.15%	33.90%	77.45%	51.73%
2019-05-08	Baidu-VIS	46.60%	32.89%	79.96%	26.26%
2019-03-23	PMTD	45.64%	32.09%	78.99%	52.51%
2018-11-20	Pixel-Anchor	44.36%	32.15%	71.54%	27.03%
2019-06-11	4Paradigm-Data-Intelligence	43.11%	29.07%	83.37%	24.42%
2019-05-23	4Paradigm-Data-Intelligence	42.96%	29.06%	82.32%	24.00%
2017-06-28	SCUT_DLVClab1	42.20%	29.86%	71.92%	52.71%
2018-10-29	Amap-CVLab	41.86%	28.52%	78.63%	51.64%
2019-12-13	BDN	41.08%	26.95%	86.32%	23.12%
2018-11-28	CRAFT	40.43%	28.37%	70.34%	19.96%
2017-11-09	EAST++	40.08%	27.29%	75.47%	34.10%
2018-05-18	PSENet_NJU_ImagineLab (single-scale)	39.63%	27.08%	73.87%	20.21%
2018-11-15	USTC-NELSLIP	38.09%	24.99%	80.04%	46.23%
2019-01-08	ALGCD_CP	36.41%	23.88%	76.63%	32.94%
2018-12-04	SPCNet_TongJi & UESTC (multi scale)	35.96%	23.95%	72.19%	17.42%
2018-12-05	EPTN-SJTU	33.30%	21.52%	73.52%	33.46%
2019-05-30	Thesis-SE	32.77%	21.27%	71.39%	30.12%
2018-03-12	ATL Cangjie OCR	32.11%	20.84%	69.98%	35.50%
2019-07-15	stela	32.07%	22.31%	57.05%	28.65%
2017-06-29	SARI_FDU_RRPN_v1	30.15%	19.23%	69.71%	40.70%
2018-12-03	SPCNet_TongJi & UESTC (single scale)	27.97%	17.16%	75.68%	13.15%
2017-06-28	SARI_FDU_RRPN_v0	25.27%	15.60%	66.48%	36.44%
2017-06-30	TH-DL	25.09%	16.98%	48.08%	20.81%
2017-06-30	linkage-ER-Flow	17.34%	10.30%	54.72%	12.37%
2017-06-30	Sensetime OCR	14.98%	8.27%	80.04%	40.38%

Inactive evaluations

method: TH2020-04-16

method: Sogou_OCR2019-11-08

method: AntAI-Cognition2020-04-22

Ranking Table

Ranking Graphic

Ranking Graphic