method: multi-modal based KIE through model fusion2023-03-21

Authors: Jie Li,Wei Wang,Min Xu, Yiru Zhao,Bin Zhang,Pengyu Chen,Danya Zhou,Yuqi Zhang,Ruixue Zhang,Di Wang,Hui Wang,Chao Li,Shiyu Hu,Dong Xiang,Songtao Li,Yunxin Yang

Affiliation: SPDB LAB

Email: 18206291823@163.com

Description: Our approach for document information extraction is based on fine-tuning LayoutLMv3, a pre-trained model for document analysis and recognition. We used the general-purpose LayoutLMv3 model as the foundation and fine-tuned it on the competition data. To address the long-tail and imbalanced distribution of the task 2 competition data, we synthesized additional data for minority categories. In post-processing, we sorted the text for each category according to its reading order. Our method achieved an F1 score of approximately 0.85 on the validation set, demonstrating its effectiveness in extracting information from various document types. Further more,3-fold cross-validation is introduced in our training,and the output of these 3 models are fused to achieve a more robust result.