method: Tencent-DPPR Team2019-05-01
Authors: Haoxi Li, Sicong Liu, Longhuang Wu, Shangxuan Tian, Chunchao Guo, Haibo Qin, Chang Liu, Hongfa Wang, Hongkai Chen, Qinglin lu, Xucheng Yin, Lei Xiao
Description: Detection model is based on a two stage detection method , and recognition method is ensembled by several CNN+BiGRU+CTC / CNN+BiGRU+Attention_Seq2Seq / CNN+SelfAttention+CTC Models.
method: HUST_VLRGROUP2019-04-30
Authors: Mingkun Yang, Hui Zhang, Zhen Zhu, Mengde Xu, Jiehua Yang, Jing Wang, Yibin Ye, Shenggao Zhu, Dandan Tu
Description: We are from Huazhong University of Science and Technology. We adopt a two-stage method for this e2e task. Specifically, our detection method is modified from Mask TextSpotter [1] based on the ResNet-50-FPN backbone. We only use its detection part while omitting the text recognition part. We first conduct an aspect ratio clustering in the training set and set the anchor scales for the region proposal network to (0.1, 0.18, 0.25, 0.5, 1.0, 2.0). In order to give high quality proposals, we manipulate Cascade R-CNN [2] in the network and set the positive IoUs to (0.7, 0.5, 0.6, 0.7) and the negative IoUs to (0.3, 0.5, 0.6, 0.7). We also change the convolutions in the last two stages to modulated deformable convolutions [3] to enhance model’s ability to capture large or long text instances that widely appear in the dataset. The detection network is trained with the minimum side of the input image set to 1600. We conduct multi-scale testing for better performance at scales (1000, 1200, 1400, 1600, 1800). The final results from multiple scales are obtained by filtering boxes whose scores are under a threshold 0.7 and then through a standard non-maximum suppression method with overlap set to 0.1. In addition, We mainly use CRNN equipped with multipy advanced backbones and some improvements to obtain the final results. To handle some irregular text instances, we add a rectification module before recognition.
[1] Pengyuan Lyu, Minghui Liao, Cong Yao, Wenhao Wu, Xiang Bai. Mask TextSpotter: An End-to-End Trainable Neural Network for Spotting Text with Arbitrary Shapes. ECCV 2018.
[2] Zhaowei Cai, Nuno Vasconcelos. Cascade R-CNN: Delving Into High Quality Object Detection. CVPR 2018.
[3] Xizhou Zhu, Han Hu, Stephen Lin, Jifeng Dai. Deformable ConvNets v2: More Deformable, Better Results. CoRR abs/1811.11168 (2018).
method: PMTD2019-04-30
Authors: Qixin Yan, Jingchao Liu, Xuebo Liu, Ding Liang
Description: Pyramid Mask Text Detector, see https://arxiv.org/abs/1903.11800 for detail.
Date | Method | 1 - N.E.D. | Precision | Recall | F-score | |||
---|---|---|---|---|---|---|---|---|
2019-05-01 | Tencent-DPPR Team | 66.66% | 64.46% | 57.84% | 60.97% | |||
2019-04-30 | HUST_VLRGROUP | 63.42% | 61.75% | 59.39% | 60.55% | |||
2019-04-30 | PMTD | 63.36% | 58.67% | 54.97% | 56.76% | |||
2019-04-29 | baseline_0.7_polygon_class_5435 | 63.16% | 59.37% | 53.23% | 56.13% | |||
2019-04-30 | pursuer | 61.51% | 61.31% | 52.96% | 56.83% | |||
2019-04-30 | MCEM | 60.56% | 70.19% | 54.48% | 61.34% | |||
2019-04-30 | SRC-B_LSVT | 57.48% | 55.40% | 46.83% | 50.76% | |||
2019-04-30 | VIC-LISAR | 56.00% | 57.00% | 47.13% | 51.60% | |||
2019-04-29 | CLTDR | 52.94% | 50.57% | 46.11% | 48.24% | |||
2019-05-01 | Fudan-Supremind Recognition | 52.27% | 47.93% | 39.33% | 43.21% | |||
2019-04-29 | Papago OCR (PixelLink+) | 48.68% | 50.18% | 41.82% | 45.62% | |||
2019-04-20 | Simple Baseline | 36.98% | 28.39% | 27.77% | 28.08% | |||
2019-04-30 | CRAFT + TPS-ResNet v1 | 27.59% | 30.10% | 27.48% | 28.73% | |||
2019-04-30 | spotter | 26.16% | 21.50% | 19.05% | 20.20% |