method: HUST_VLRGROUP2019-04-30

Authors: Zhen Zhu, Mengde Xu, Mingkun Yang, Hui Zhang, Jiehua Yang, Jing Wang, Yibin Ye, Shenggao Zhu, Dandan Tu

Description: We are from Huazhong University of Science and Technology. Our detection method is modified from Mask TextSpotter based on the ResNet-50-FPN backbone. We only use its detection part while omitting the text recognition part. We first conduct an aspect ratio clustering in the training set and set the anchor scales for the region proposal network to (0.1, 0.18, 0.25, 0.5, 1.0, 2.0). In order to give high quality proposals, we manipulate Cascade R-CNN in the network and set the positive IoUs to (0.7, 0.5, 0.6, 0.7) and the negative IoUs to (0.3, 0.5, 0.6, 0.7). We also change the convolutions in the last two stages to modulated deformable convolutions to enhance model’s ability to capture large or long text instances that widely appear in the dataset. The detection network is trained with the minimum side of the input image set to 1600. We conduct multi-scale testing for better performance at scales (1000, 1200, 1400, 1600, 1800). The final results from multiple scales are obtained by filtering boxes whose scores are under a threshold 0.7 and then through a standard non-maximum suppression method with overlap set to 0.1.