Method: AlimamaCV - Task 4 - End-to-End - Born-Digital Images (Web and Email)

method: AlimamaCV2016-05-13

Authors: Quan Chen, Tiezheng Ge, Zhiqiang Zhang, Minghui Li, Kun Gai

Description: We approach this task with a combination of three deep neural networks and a language model. Specifically , a LSTM model is used to accomplish word recognition based on the features generated by a CNN model. The final words are decoded by a bi-gram language model and their locations are refined by a location regression network. Two internal text corpora are involved in the training procedure. For "strongly" and "weakly" version, the given corresponding vocabulary is simply used as the final output filter.