Authors: Tencent Youtu Lab , USTC
Affiliation: Tencent Youtu Lab , USTC
Description: A novel architecture is proposed for key information extraction task,
which exploits multi-modal (i.e., CV and NLP) information.
The global visual features are introduced to enhance the discrimination between layouts
for improving the detection of the easily-confused and hard samples.
Moreover, fine-grained labels and a customized attention mechanism are utilized to improve the performance of algorithm in terms of boundary characters.
Following the same evaluation rules as other competitives, the OCR mismatch errors are excluded in submission. The paper is in preparation.