Method: Baseline HMM - Task 1 - End to End Recoginition - Information Extraction in Historical Handwritten Records

method: Baseline HMM2017-07-01

Authors: Organizers

Description: This baseline system is based on Hidden Markov Models (HMMs) and a category based n-gram model for language modeling.
Then a Grammatical Inference technique known as MGGI has been used to improve the semantic accuracy of the category-based language model.
In MGGI, a-priory knowledge is used to label the words of the training strings in such a way that a simple bigram can be trained from the transformed strings.
The knowledge used allows the MGGI to produce a language model which captures important dependencies of the language underlying in the handwritten records considered.
The line images were preprocessed and a sequence of feature vectors based on the gray level of the image was obtained for each image.
Since we carried out experiments at license level, the lines of the test set were concatenated into licenses.
The characters were modeled by continuous density left-to-right HMMs with 6 states and 64 Gaussian mixture components per state.
These models were estimated using the Baum-Welch algorithm. For decoding we used the Viterbi algorithm