method: LayoutMask-v22023-03-21

Authors: Yi Tu

Affiliation: Ant Group

Description: We use LayoutMask, a multi-modal pre-trained model as the backbone, which uses text and layout information as the mode input. We use both Chinese and English documents data for pre-training. During finetuning, we also use the label information in task 1 as auxiliary data.