method: sample-12023-03-25

Authors: Zhenrong Zhang, Lei Jiang, Youhui Guo, Jianshu Zhang, Jun Du

Affiliation: University of Science and Technology of China (USTC), iFLYTEK AI Research

Email: zzr666@mail.ustc.edu.cn

Description: 1. We use the UniLM[2] and LiLT[3] as decoder to utilize text and layout information, OCR results with manual-rule sorting are fed into decoder to predict target.
2. We assemble DocPrompt[1], UniLM[2] and LiLT[3].

[1] https://github.com/PaddlePaddle/PaddleNLP/blob/develop/model_zoo/ernie-layout/README_ch.md [2] https://github.com/microsoft/unilm/blob/master/s2s-ft/

[3] Jiapeng Wang, Lianwen Jin and Kai Ding. LiLT: A Simple yet Effective Language-Independent Layout Transformer for Structured Document Understanding. 2022, ACL.