method: CLTDR2019-04-29

Authors: Hong Wang, Weiyuan Shao, Haonan Qiu, Jianqi Ma, Zhiqiang Shen

Description: We adopt ResNet + FPN as backbone network and we use features from different resolutions of FPN, in order to utilize the spatial and semantic information. We first resize and concatenate these features
to generate a new vector. Then we involve three additional branches based on this vector. The first branch is to classify text and non-text region. The second one is to regress offsets from certain points in the text region to their corresponding center lines, and the last one is to regress normalized distance from points to text boundary.
The recognition network is a CRNN-like structure. We modified the original CRNN by using a list of cropped images rather than a single one as inputs. We also pre-train network parameters on ReCTS19
task1 dataset to accelerate convergence.