Authors: Donglai Xiang, Jiaming Guo, Liangrui Peng, Changsong Liu
Description: A CNN with a multi-level feature pyramid is used. It consists of a modified FCN with residual connection as a proposal generator and a Fast R-CNN detector with Rotation RoI pooling for multi-oriented text detection. Firstly, an image is input into the FCN with residual connection which predicts a salient map that contains the probability of every pixel belonging to a text region. Then, the map is binarized at multiple thresholds, and connected components (CCs) are extracted. The CCs that break into multiple parts at a higher threshold are selected and their bounding boxes represent region proposals. Next, the features of the region proposals after Rotation RoI pooling are input into the Fast R-CNN network that filters non-text regions and regresses the bounding boxes to more accurate positions. Finally, non- maximum suppression is performed to obtain the text detection results.