Authors: J.Ignacio Toledo, Manuel Carbonell, Alicia Fornés, Josep Lladós

Description: The sequence tagging is performed with a CNN+BLSTM neural network, that accepts a sequence of word images as inputs and produces two independent softmax outputs (person and category) for each word. Those word images tagged as meaningful are then transcribed by generating a sequence of PHOC embeddings that are fed into a BLSTM+CTC network.

Authors: Xiangping Wu, Qingcai Chen, Jinghan You

Description: In the competition of Information Extraction in Historical Handwritten Records, our method is divided into two parts: character recognition and named entity recognition.
In the handwritten old Catalan text recognition stage, we present a novel, segmentation-free, word-wise character recognition method without any external linguistic knowledge. In this method, the position information of each character is converted into a vector. A kind of uni-gram model is then constructed and integrated into the residual neural netwok for training. The whole process of character recognition consists 3 steps: (1) data pre-processing; (2) model training; and (3) model running. In the first step, we normalize the color image to the size 100×200 by statistic the size information of word-based image. Add a terminator “*” at the end of each word. Second, model building and training. The first part of the recognition model draws on the resnet network to extract the feature from the input image and then a feature vector is generated. At the same time, we randomly generate a multi-dimensional vector for each location. Next, we combine the eigenvectors generated by the resnet network and the randomly generated multi-dimensional position vectors into a new feature vector. At the end of the network we added a fully connected neural network with a hidden layer and the dropout of 0.5 is used. The total number of network output layer units is 60, including 59 classes of Catalan basic characters and a terminator '*'. Third, model prediction. According to the statistics for the training set, we calculated the length of the longest word with the terminator. Then, the predicted length of the word in the test set is set to 15 to ensure that the end of the long word can be identified. Finally, we remove all the terminator to get the word predictions. This character recognition method does not depend on external language information such as dictionaries. The main contributions of the location information is to guide the resnet network automatically learn the knowledge of segment characters and identifying the corresponding location of the characters.
In the named entity identification stage, we simply use the CRF sequence tagging method via the CRF++ tool box. We first predict the category based on the record and the first template. And then predict the person based on another template and the record of the transcript and the category predicted in the previous step.

Authors: Xiangping Wu, Qingcai Chen, Linlin Wang, Qing Zhang

Description: In the competition of Information Extraction in Historical Handwritten Records, our method is divided into two parts: character recognition and named entity recognition.
In the handwritten old Catalan text recognition stage, we present a novel, segmentation-free, word-wise character recognition method without any external linguistic knowledge. In this method, the position information of each character is converted into a vector. A kind of bi-gram model is then constructed and integrated into the convolution neural network for training. The whole process of character recognition consists three steps: (1) data pre-processing; (2) model training; and (3) model running. In the first step, we normalize the color image to the size 100×200 by statistic the size information of word-based image. Add a terminator “*” at the end of each word, and then statistic the bi-gram combination of all the characters on the training set. Since the numbers are not combined with the letters, we selected 2560 bi-gram combinations from 60 characters (59 primitives and 1 terminator) as the training classes. Next, we take into consideration the spatial location of the bi-gram inside the word, select 14 positions and convert each position to a multi-dimensional random vector. The random vector of the position information is only generated once. In the second step, we use convolution neural network(CNN) and combine location information to build a system which, given an image, produces a prediction of the image transcription without constructing any attribute features. The network is trained using the aggregated sigmoid cross-entropy (logistic) loss and a learning rate of 0.01. In the final step, given an image and a location vector, it is run through the network. Then the network outputs the prediction results corresponding to the location of the image transcription. In running step, we output the recognition results for 14 positions of each word, and then remove the first occurrence of the terminator and the following characters. When the predictions are in conflict, corrections are applied for post processing, according to the probability of the bi-gram frequency statistic only for the training set. For example, if the probability of “er” is greater than the probability of “eu”, we choose “er” as the final prediction result of the input image.
In the named entity identification stage, we simply use the CRF sequence tagging method via the CRF++ tool box. We first predict the category based on the record and the first template. And then predict the person based on another template and the record of the transcript and the category predicted in the previous step.

Ranking Table

Description Paper Source Code
DateMethodBasic ScoreComplete ScoreNameSurnameLocationOccupationStateInput Type
2018-10-26Information Extraction from Historical Handwritten Document Images with a Context-aware Neural Mode94.62%94.02%95.49%91.32%95.18%93.89%97.21%WORD
2017-07-08Resnet based uni-gram method for segment-free liaison handwriting recognition and NER tagging94.18%91.99%95.68%91.23%94.93%93.77%95.35%WORD
2017-07-08CNN based Bi-gram method for segment-free liaison handwriting recognition and NER tagging87.58%85.74%91.82%69.19%89.36%91.04%97.82%WORD
2017-07-01Baseline CNN79.42%70.20%83.01%65.25%66.31%86.26%97.68%WORD

Ranking Graphic

Ranking Graphic