Results - ICDAR2019 Robust Reading Challenge on Large-scale Street View Text with Partial Labeling

method: Tencent-DPPR Team2019-05-01

Authors: Haoxi Li, Sicong Liu, Longhuang Wu, Shangxuan Tian, Chunchao Guo, Haibo Qin, Chang Liu, Hongfa Wang, Hongkai Chen, Qinglin lu, Xucheng Yin, Lei Xiao

Description: Detection model is based on a two stage detection method , and recognition method is ensembled by several CNN+BiGRU+CTC / CNN+BiGRU+Attention_Seq2Seq / CNN+SelfAttention+CTC Models.

method: HUST_VLRGROUP2019-04-30

Authors: Mingkun Yang, Hui Zhang, Zhen Zhu, Mengde Xu, Jiehua Yang, Jing Wang, Yibin Ye, Shenggao Zhu, Dandan Tu

Description: We are from Huazhong University of Science and Technology. We adopt a two-stage method for this e2e task. Specifically, our detection method is modified from Mask TextSpotter [1] based on the ResNet-50-FPN backbone. We only use its detection part while omitting the text recognition part. We first conduct an aspect ratio clustering in the training set and set the anchor scales for the region proposal network to (0.1, 0.18, 0.25, 0.5, 1.0, 2.0). In order to give high quality proposals, we manipulate Cascade R-CNN [2] in the network and set the positive IoUs to (0.7, 0.5, 0.6, 0.7) and the negative IoUs to (0.3, 0.5, 0.6, 0.7). We also change the convolutions in the last two stages to modulated deformable convolutions [3] to enhance model’s ability to capture large or long text instances that widely appear in the dataset. The detection network is trained with the minimum side of the input image set to 1600. We conduct multi-scale testing for better performance at scales (1000, 1200, 1400, 1600, 1800). The final results from multiple scales are obtained by filtering boxes whose scores are under a threshold 0.7 and then through a standard non-maximum suppression method with overlap set to 0.1. In addition, We mainly use CRNN equipped with multipy advanced backbones and some improvements to obtain the final results. To handle some irregular text instances, we add a rectification module before recognition.

[1] Pengyuan Lyu, Minghui Liao, Cong Yao, Wenhao Wu, Xiang Bai. Mask TextSpotter: An End-to-End Trainable Neural Network for Spotting Text with Arbitrary Shapes. ECCV 2018.
[2] Zhaowei Cai, Nuno Vasconcelos. Cascade R-CNN: Delving Into High Quality Object Detection. CVPR 2018.
[3] Xizhou Zhu, Han Hu, Stephen Lin, Jifeng Dai. Deformable ConvNets v2: More Deformable, Better Results. CoRR abs/1811.11168 (2018).

method: PMTD2019-04-30

Authors: Qixin Yan, Jingchao Liu, Xuebo Liu, Ding Liang

Description: Pyramid Mask Text Detector, see https://arxiv.org/abs/1903.11800 for detail.

Pyramid Mask Text Detector

Source code

Ranking Table

Description Paper Source Code

Date	Method	1 - N.E.D.	Precision	Recall	F-score
2019-05-01	Tencent-DPPR Team	66.66%	64.46%	57.84%	60.97%
2019-04-30	HUST_VLRGROUP	63.42%	61.75%	59.39%	60.55%
2019-04-30	PMTD	63.36%	58.67%	54.97%	56.76%
2019-04-29	baseline_0.7_polygon_class_5435	63.16%	59.37%	53.23%	56.13%
2019-04-30	pursuer	61.51%	61.31%	52.96%	56.83%
2019-04-30	MCEM	60.56%	70.19%	54.48%	61.34%
2019-04-30	SRC-B_LSVT	57.48%	55.40%	46.83%	50.76%
2019-04-30	VIC-LISAR	56.00%	57.00%	47.13%	51.60%
2019-04-29	CLTDR	52.94%	50.57%	46.11%	48.24%
2019-05-01	Fudan-Supremind Recognition	52.27%	47.93%	39.33%	43.21%
2019-04-29	Papago OCR (PixelLink+)	48.68%	50.18%	41.82%	45.62%
2019-04-20	Simple Baseline	36.98%	28.39%	27.77%	28.08%
2019-04-30	CRAFT + TPS-ResNet v1	27.59%	30.10%	27.48%	28.73%
2019-04-30	spotter	26.16%	21.50%	19.05%	20.20%

Inactive evaluations

method: Tencent-DPPR Team2019-05-01

method: HUST_VLRGROUP2019-04-30

method: PMTD2019-04-30

Ranking Table

Ranking Graphic

Ranking Graphic