Robust ReadingCompetition
Challenges

method: Naver Labs2018-06-25

Authors: Animesh Prasad, Hervé Déjean, Jean-Luc Meunier, Max Weidemann, Johannes Michael, Gundram Leifert

Description: For this task we use a pipeline approach where first the line image is preprocessed and then passed through a CNN-BLSTM architecture with CTC loss (i.e. HTR). Then in next step, we use a BLSTM over the feature layer (computed as all character n-gram for the tokens generated from best effort decoding of HTR output) trained using cross entropy loss to maximize the accuracy.

method: CITlab ARGUS (with OOV)2017-07-09

Authors: Tobias Strauß, Max Weidemann, Johannes Michael, Gundram Leifert, Tobias Grüning, Roger Labahn

Description: The training data is divided into a training set (2790 line images) and a validation set (280 line images). Several normalization methods such as contrast, size, slant and skew normalization are applied. These preprocessed line images serve as input for the optical model, a recurrent neural network (layer from input to output: conv, conv, lstm (256 cells), conv, lstm (512 cells)) trained by CTC (150 epochs of 5000 noisy line images each). To enlarge input variety, the line images we use data argumentation on line images.
The output of the optical model are probabilities for each character at each position in the image collected in a matrix. The various output matrices for one record (which represent the lines) are glued together to one single matrix. We define regular expressions to extract the required information from this matrix. This is done in two steps: First, we segment the matrix into regions of interest: regions containing information about the husband, the husbands parents, the wife or the wife's parents. These regions are matched against a valid combination of dictionary items in a second step. For the name fields additional OOV words are allowed if the dictionary items do not fit.

Authors: Tobias Strauß, Max Weidemann, Johannes Michael, Gundram Leifert, Tobias Grüning, Roger Labahn

Description: The training data is divided into a training set (2790 line images) and a validation set (280 line images). Several normalization methods such as contrast, size, slant and skew normalization are applied. These preprocessed line images serve as input for the optical model, a recurrent neural network (layer from input to output: conv, conv, blstm (512), conv, blstm (512 cells), blstm (512 cells)) trained by CTC (150 epochs of 5000 noisy line images each). To enlarge input variety, the line images we use data argumentation on line images.
The output of the optical model are probabilities for each character at each position in the image collected in a matrix. The various output matrices for one record (which represent the lines) are glued together to one single matrix. We define regular expressions to extract the required information from this matrix. This is done in two steps: First, we segment the matrix into regions of interest: regions containing information about the husband, the husbands parents, the wife or the wife's parents. These regions are matched against a valid combination of dictionary items in a second step. For the name fields additional OOV words are allowed if the dictionary items do not fit.

Ranking Table

Description Paper Source Code
DateMethodBasic ScoreComplete ScoreNameSurnameLocationOccupationState
2018-06-25Naver Labs95.46%95.03%97.01%92.73%95.03%96.43%96.41%
2017-07-09CITlab ARGUS (with OOV)91.94%91.58%95.14%85.78%88.43%93.08%97.54%
2017-07-10CITlab ARGUS (with OOV, net2)91.63%91.19%95.09%85.84%87.32%92.96%97.19%
2017-07-09CITlab ARGUS (without OOV)89.54%89.17%94.37%76.54%87.65%92.66%97.43%
2017-07-01Baseline HMM80.28%63.11%81.06%60.15%78.90%90.23%93.79%

Ranking Graphic

Ranking Graphic