Authors: Dai Yuchen
Description: I'm from Shanghai Jiao Tong University. This mothod uses Deformable Convolutional Nets as the base architecture. A resnet-101 is used as the backbone convolutional network for feature extraction. During feature extraction, deformable convolution layers are added to catch the text patterns with deformable convolutional kernels. Then region proposal network, which are 3x3 convolutions, generate regions of interest. Then a deformable ROI pooling layer is used to crop ROIs to fixed- size feature maps. Then these representation of ROIs are sent to the final classification and box-regression branches.