- Task 2 - E2E Complex Entity Labeling - Method: multi-modal based KIE through model fusion
- Method info
- Samples list
- Per sample details
method: multi-modal based KIE through model fusion2023-03-21
Authors: Jie Li,Wei Wang,Min Xu, Yiru Zhao,Bin Zhang,Pengyu Chen,Danya Zhou,Yuqi Zhang,Ruixue Zhang,Di Wang,Hui Wang,Chao Li,Shiyu Hu,Dong Xiang,Songtao Li,Yunxin Yang
Affiliation: SPDB LAB
Email: 18206291823@163.com
Description: We tackle the SER task on the XFUND dataset, using F1 score as the evaluation metric. We split the data into 20% validation and 80% training sets.Our method combines a multi-modal LayoutLMv3 model and an ERNIE-layout model. We augment the data to address class imbalance and use official OCR results for inference.We preprocess the input data into the XFUND format, and perform basic cleaning and tokenization. We chose the LayoutLMv3 and ERNIE-layout models due to their strong performance on document analysis tasks. To address class imbalance, we synthesized additional data for certain underrepresented classes and augmented the images with random cropping.During inference, we used the official OCR results as input to our model.Our method achieved an accuracy of 86.0% F1 score on validation set.During post-processing time,we fused the output of LayoutLMv3 and ERNIE-layout by weighting their F1 score on each class.