method: Enhancing Text Recognition Accuracy by Adding External Language Model2017-06-30

Authors: Ahmed Sabir

Description: This approach focuses on the integration of independent language model to a pre-trained deep network. The advantage of trainable language model is to enrich the probability of the words selected by the network taking into account external knowledge (in this case, a unigram language model learnt from freely available corpus). This hybrid approach opens the possibility of introducing higher-order trainable language models. They apply a unigram language model (LM) over a deep CNN with a 90k-words pre-defined dictionary [1]. The unigram model was trained on the Opensubtitles corpora. Opensubtitles is a database based on subtitles for movies. The corpus contains around 3 million words (combination of words and digits). They took only the five max probable words output form CNN softmax layer with pre-defined dictionary, and rerank them combining the softmax output with the unigram probabilities estimated from large-scale English corpora.
[1] Jaderberg, M., Simonyan, K., Vedaldi, A., & Zisserman, A. (2016). Reading text in the wild with convolutional neural networks. International Journal of Computer Vision, 116(1), 1-20.