Authors: Zhenrong Zhang, Jiefeng Ma, Jun Du
Affiliation: National Engineering Research Center of Speech and Language Information Processing (NERC-SLIP), University of Science and Technology of China.
Description: 1. GraphDoc is a multi-modal graph attention-based model for various Document Understanding tasks.
2. GraphDoc is pretrained on the RVL-CDIP training dataset, which contains only 320k document images.
4. Following the same evaluation rules as others, the OCR mismatch errors are excluded in the submission.