Authors: Dongbao Yang, Yudi Chen, Zhi Qiao, Xugong QIn, Yu Zhou.(IIE,CAS)
Description: This method is an modifed version of Mask R-CNN. Cascaded structures are
appended to box head and mask head to improve the performance of the two
branches, which leads to more accurate localization. We perform anchor
clustering on training set to choose appropriate anchor aspect ratios for
RPN. Deformable convolutions are inserted in the last three stages of the backbone
to well fit the variation of shape of text. FPN module is used for feature fusion
and dynamic anchor selection.