method: Arbitrary shape scene text recognition based on CNN and Attention Enhanced Bi-directional LSTM2019-04-21

Authors: Xufuyong

Description: We used a attention-enhancing network architecture with flexible corrections for text recognition in any scene. The method consists of a text correction network consisting of a convolutional neural network and an LSTM based on attention enhancement. The correction network adaptively converts the input image into a new image and corrects the text therein. It reduces the difficulty of recognition, allowing the attention-enhanced sequence recognition network to predict character sequences directly from the corrected image. The text correction network divides the image into sections by CNN and then predicts the offset of each section. The offset is applied to the pixels of the original image to obtain the corrected image, and then the corrected image is passed to the text recognition network. Its main structure is the CNN-BLSTM framework. In the encoder part we use the CRNN architecture. The decoder is based on a two-way GRU that predicts text results in both directions.