Authors: Pengwen Dai(IIE,CAS)
Description: This method inherits the instance segmentation framework Mask-RCNN , and utilizes the backbone network ResNet-50. We incorporate the Feature Pyramid Network (FPN) with irregular convolution filters to capture more suitable receptive fields for the extremely high or wide scene text, into our model. Besides, we also employ multi-scale region-of-interest (ROI) pooling with attention mechanism, where the attention weights are learnable. In addition, we also exploit context aware features for capturing global information in the mask branch.