Method: GraphDoc+Classify+Merge - Task 1 - Key Information Localization and Extraction - Document Information Localization and Extraction

method: GraphDoc+Classify+Merge2023-05-24

Authors: Yan Wang, Jiefeng Ma, Zhenrong Zhang, Pengfei Hu, Jianshu Zhang, Jun Du

Affiliation: University of Science and Technology of China (USTC), iFLYTEK AI Research

Description: We pre-trained several GraphDoc models on provided unlabelled documents under different configurations. We then fine-tuned the models on the training set for 200-500 epochs. After classifying OCR boxes into various categories, we proposed a Merger module to handle the aggregation process.
We also used some pre/post-processing according to the text content and distances between OCR boxes. Finally, we adopted model ensembling to further enhance the system performance.

Multimodal Pre-training Based on Graph Attention Network for Document Understanding

Source code