Method: Resnet based uni-gram method for segment-free liaison handwriting recognition and NER tagging - Task 1 - End to End Recoginition - Information Extraction in Historical Handwritten Records

method: Resnet based uni-gram method for segment-free liaison handwriting recognition and NER tagging2017-07-08

Authors: Xiangping Wu, Qingcai Chen, Jinghan You

Description: In the competition of Information Extraction in Historical Handwritten Records, our method is divided into two parts: character recognition and named entity recognition.
In the handwritten old Catalan text recognition stage, we present a novel, segmentation-free, word-wise character recognition method without any external linguistic knowledge. In this method, the position information of each character is converted into a vector. A kind of uni-gram model is then constructed and integrated into the residual neural netwok for training. The whole process of character recognition consists 3 steps: (1) data pre-processing; (2) model training; and (3) model running. In the first step, we normalize the color image to the size 100×200 by statistic the size information of word-based image. Add a terminator “*” at the end of each word. Second, model building and training. The first part of the recognition model draws on the resnet network to extract the feature from the input image and then a feature vector is generated. At the same time, we randomly generate a multi-dimensional vector for each location. Next, we combine the eigenvectors generated by the resnet network and the randomly generated multi-dimensional position vectors into a new feature vector. At the end of the network we added a fully connected neural network with a hidden layer and the dropout of 0.5 is used. The total number of network output layer units is 60, including 59 classes of Catalan basic characters and a terminator '*'. Third, model prediction. According to the statistics for the training set, we calculated the length of the longest word with the terminator. Then, the predicted length of the word in the test set is set to 15 to ensure that the end of the long word can be identified. Finally, we remove all the terminator to get the word predictions. This character recognition method does not depend on external language information such as dictionaries. The main contributions of the location information is to guide the resnet network automatically learn the knowledge of segment characters and identifying the corresponding location of the characters.
In the named entity identification stage, we simply use the CRF sequence tagging method via the CRF++ tool box. We first predict the category based on the record and the first template. And then predict the person based on another template and the record of the transcript and the category predicted in the previous step.