method: Adelaide_ConvLSTMs2016-03-17

Authors: Hui Li, Chunhua Shen, Peng Wang

Description: We tackle the problem of end-to-end scene text detection and recognition by CNN and LSTMs.
Firstly we detect all text lines in an image by using a 37-class (26 case-insensitive characters, 10 digits and background) CNN in a sliding window fashion, which results in a high recall.
Then we regard the text line recognition as a sequence labelling problem.
A recurrent neural network (RNN) with long short-term memorys (LSTMs) is trained to recognize the sequential features extracted from the text line image via CNNs.
We introduce the background class into RNN for recognition, so that the well-trained LSTMs can filter out false positives in the text lines, and separate each words in the text line as well.
By exploring context information and avoiding errors caused by separation, this method achieves state-of-the-art performance in the end-to-end task.
Some detailed description can refer to [1].

[1]Hui Li, Chunhua Shen, "Reading Car License Plates Using Deep Convolutional Neural Networks and LSTMs", [online] http://arxiv.org/abs/1601.05610