method: Baseline CNN2017-07-01

Authors: Organizers

Description: This baseline method is based on Convolutional Neural Networks (CNNs).
We divide the 100 pages of available training data into 90 pages (28346 word images) for train and 10 pages (3155 word images) for validation.
This data is used to train two different neural network models.
The first model is trained to perform the semantic categorization.
The network is a relatively small CNN like the one described that can accept word images and outputs the semantic category of each word.

The second model is used to perform the transcription. In this case the model has two very diferentiated parts; the first part is a CNN that embeds small windows
of text into the PHOC space. The second part is a two-layer BLSTM network that performs the sequence recognition and outputs the transcription.

Both methods were trained using ’early stopping’, that is, to keep training until no improvement in validation accuracy is observed for a certain number (20) of epochs.
Finally, a parser is used to assign the person to the categories. We make use of the anchor words to distinguish the persons. For example, the keyword ’ab’ marks the starting
of the information concerning the ’wife’. The keyword ’fill’ separates the husband from his parents, while ’filla’ separates the wife from her parents. And the word ’y’ is used to separate the father from the mother