method: Naver Labs2018-06-25

Authors: Animesh Prasad, Hervé Déjean, Jean-Luc Meunier, Max Weidemann, Johannes Michael, Gundram Leifert

Description: For this task we use a pipeline approach where first the line image is preprocessed and then passed through a CNN-BLSTM architecture with CTC loss (i.e. HTR). Then in next step, we use a BLSTM over the feature layer (computed as all character n-gram for the tokens generated from best effort decoding of HTR output) trained using cross entropy loss to maximize the accuracy.

method: CITlab ARGUS (with OOV)2017-07-09

Authors: Tobias Strauß, Max Weidemann, Johannes Michael, Gundram Leifert, Tobias Grüning, Roger Labahn

Description: The training data is divided into a training set (2790 line images) and a validation set (280 line images). Several normalization methods such as contrast, size, slant and skew normalization are applied. These preprocessed line images serve as input for the optical model, a recurrent neural network (layer from input to output: conv, conv, lstm (256 cells), conv, lstm (512 cells)) trained by CTC (150 epochs of 5000 noisy line images each). To enlarge input variety, the line images we use data argumentation on line images.
The output of the optical model are probabilities for each character at each position in the image collected in a matrix. The various output matrices for one record (which represent the lines) are glued together to one single matrix. We define regular expressions to extract the required information from this matrix. This is done in two steps: First, we segment the matrix into regions of interest: regions containing information about the husband, the husbands parents, the wife or the wife's parents. These regions are matched against a valid combination of dictionary items in a second step. For the name fields additional OOV words are allowed if the dictionary items do not fit.

Authors: Tobias Strauß, Max Weidemann, Johannes Michael, Gundram Leifert, Tobias Grüning, Roger Labahn

Description: The training data is divided into a training set (2790 line images) and a validation set (280 line images). Several normalization methods such as contrast, size, slant and skew normalization are applied. These preprocessed line images serve as input for the optical model, a recurrent neural network (layer from input to output: conv, conv, blstm (512), conv, blstm (512 cells), blstm (512 cells)) trained by CTC (150 epochs of 5000 noisy line images each). To enlarge input variety, the line images we use data argumentation on line images.
The output of the optical model are probabilities for each character at each position in the image collected in a matrix. The various output matrices for one record (which represent the lines) are glued together to one single matrix. We define regular expressions to extract the required information from this matrix. This is done in two steps: First, we segment the matrix into regions of interest: regions containing information about the husband, the husbands parents, the wife or the wife's parents. These regions are matched against a valid combination of dictionary items in a second step. For the name fields additional OOV words are allowed if the dictionary items do not fit.

Ranking Table

Description Paper Source Code
DateMethodBasic ScoreComplete ScoreNameSurnameLocationOccupationStateInput Type
2018-06-25Naver Labs95.46%95.03%97.01%92.73%95.03%96.43%96.41%LINE
2017-07-09CITlab ARGUS (with OOV)91.94%91.58%95.14%85.78%88.43%93.08%97.54%LINE
2017-07-10CITlab ARGUS (with OOV, net2)91.63%91.19%95.09%85.84%87.32%92.96%97.19%LINE
2018-10-27Joint HTR + NER no postprocessing90.59%89.40%89.94%84.07%90.71%92.10%96.59%LINE
2017-07-09CITlab ARGUS (without OOV)89.54%89.17%94.37%76.54%87.65%92.66%97.43%LINE
2017-07-01Baseline HMM80.28%63.11%81.06%60.15%78.90%90.23%93.79%LINE

Ranking Graphic

Ranking Graphic