method: Tencent-DPPR Team2019-05-01

Authors: Haoxi Li, Sicong Liu, Longhuang Wu, Shangxuan Tian, Chunchao Guo, Haibo Qin, Chang Liu, Hongfa Wang, Hongkai Chen, Qinglin lu, Xucheng Yin, Lei Xiao

Description: Detection model is based on a two stage detection method , and recognition method is ensembled by several CNN+BiGRU+CTC / CNN+BiGRU+Attention_Seq2Seq / CNN+SelfAttention+CTC Models.

method: HUST_VLRGROUP2019-04-30

Authors: Mingkun Yang, Hui Zhang, Zhen Zhu, Mengde Xu, Jiehua Yang, Jing Wang, Yibin Ye, Shenggao Zhu, Dandan Tu

Description: We are from Huazhong University of Science and Technology. We adopt a two-stage method for this e2e task. Specifically, our detection method is modified from Mask TextSpotter [1] based on the ResNet-50-FPN backbone. We only use its detection part while omitting the text recognition part. We first conduct an aspect ratio clustering in the training set and set the anchor scales for the region proposal network to (0.1, 0.18, 0.25, 0.5, 1.0, 2.0). In order to give high quality proposals, we manipulate Cascade R-CNN [2] in the network and set the positive IoUs to (0.7, 0.5, 0.6, 0.7) and the negative IoUs to (0.3, 0.5, 0.6, 0.7). We also change the convolutions in the last two stages to modulated deformable convolutions [3] to enhance model’s ability to capture large or long text instances that widely appear in the dataset. The detection network is trained with the minimum side of the input image set to 1600. We conduct multi-scale testing for better performance at scales (1000, 1200, 1400, 1600, 1800). The final results from multiple scales are obtained by filtering boxes whose scores are under a threshold 0.7 and then through a standard non-maximum suppression method with overlap set to 0.1. In addition, We mainly use CRNN equipped with multipy advanced backbones and some improvements to obtain the final results. To handle some irregular text instances, we add a rectification module before recognition.

[1] Pengyuan Lyu, Minghui Liao, Cong Yao, Wenhao Wu, Xiang Bai. Mask TextSpotter: An End-to-End Trainable Neural Network for Spotting Text with Arbitrary Shapes. ECCV 2018.
[2] Zhaowei Cai, Nuno Vasconcelos. Cascade R-CNN: Delving Into High Quality Object Detection. CVPR 2018.
[3] Xizhou Zhu, Han Hu, Stephen Lin, Jifeng Dai. Deformable ConvNets v2: More Deformable, Better Results. CoRR abs/1811.11168 (2018).

method: PMTD2019-04-30

Authors: Qixin Yan, Jingchao Liu, Xuebo Liu, Ding Liang

Description: Pyramid Mask Text Detector, see https://arxiv.org/abs/1903.11800 for detail.

Ranking Table

Description Paper Source Code
DateMethod1 - N.E.D.PrecisionRecallF-score
2019-05-01Tencent-DPPR Team66.66%64.46%57.84%60.97%
2019-04-30HUST_VLRGROUP63.42%61.75%59.39%60.55%
2019-04-30PMTD63.36%58.67%54.97%56.76%
2019-04-29baseline_0.7_polygon_class_543563.16%59.37%53.23%56.13%
2019-04-30pursuer61.51%61.31%52.96%56.83%
2019-04-30MCEM60.56%70.19%54.48%61.34%
2019-04-30SRC-B_LSVT57.48%55.40%46.83%50.76%
2019-04-30VIC-LISAR56.00%57.00%47.13%51.60%
2019-04-29CLTDR52.94%50.57%46.11%48.24%
2019-05-01Fudan-Supremind Recognition52.27%47.93%39.33%43.21%
2019-04-29Papago OCR (PixelLink+)48.68%50.18%41.82%45.62%
2019-04-20Simple Baseline36.98%28.39%27.77%28.08%
2019-04-30CRAFT + TPS-ResNet v127.59%30.10%27.48%28.73%
2019-04-30spotter26.16%21.50%19.05%20.20%

Ranking Graphic

Ranking Graphic