method: GraphDoc2022-03-18

Authors: Zhenrong Zhang, Jiefeng Ma, Jun Du

Affiliation: National Engineering Research Center of Speech and Language Information Processing (NERC-SLIP), University of Science and Technology of China.

Email: zzr666@mail.ustc.edu.cn

Description: 1. GraphDoc is a multi-modal graph attention-based model for various Document Understanding tasks.
2. GraphDoc is pretrained on the RVL-CDIP training dataset, which contains only 320k document images.
4. Following the same evaluation rules as others, the OCR mismatch errors are excluded in the submission.