method: CITlab Argus Information Extraction (positional & line features, enhanced gt)2019-05-05

Authors: Tobias Strauß, Tobias Grüning, Gundram Leifert, Jochen Zöllner, Roger Labahn

Description: We interpreted this task as NER problem. We trained a deep neural model based on Ma and Hovy (2016) to tag the company, the date etc. from the tokenized receipt text lines. To produce ground truth for this task, we parse the ground truth from Task 2 and also our own recognition results. To enrich the training data, we randomly replaced words in the training process. The post-processing we applied was to filter duplicated results of the category "total".
Compared to previous methods, we modified the training data by tagging only the first occurrence of the given ground truth data. Unfortunately, both training data sets seem to be imperfect. Furthermore, we increased the training noise and added the information whether a new line starts.