Authors: Hanqin Wang, Jie Qin, Fan Zhu, Li Liu, and Ling Shao (Inception Institute of Artificial Intelligence)
Description: We propose an SSD (i.e. Single Shot MultiBox Detector) based one-stage detection network for text localization. To address the specific task, we make the following enhancement for the original SSD network: 1) We leverage the Inception block with different kernel widths into our model for improving the capabilities of both localization and confidence layers. 2) In the original SSD structure, there are no interactions between the localization and confidence layers. In our model, we incorporate the output feature map of the localization layer into the computation of the confidence layer, making it focus more on meaningful regions. 3) To localize wide boxes, we augment the training boxes by splitting wide boxes into several narrow ones. We also improve the matching between prior boxes and target boxes based on their ratios. In addition, we perform some post-processing steps as follows: 1) We fuse the outputs of several networks based on an intuitive merging strategy. 2) False positives are further removed based on the average pixel intensity.