method: OpenDoc(single model)2023-10-09

Authors: Huan Chen, Ya Guo, Yi Tu, Jinyang Tang, Chong Zhang, Huijia Zhu

Affiliation: Ant Group


Description: 1. We connect a LayoutMask-0.1b encoder with AntGLM-10b decoder by a linear projection
2. We utilize a union strategy from two ocr results according to iou

Tu, Yi, et al. "LayoutMask: Enhance Text-Layout Interaction in Multi-modal Pre-training for Document Understanding." arXiv preprint arXiv:2305.18721 (2023).

Du, Zhengxiao, et al. "Glm: General language model pretraining with autoregressive blank infilling." arXiv preprint arXiv:2103.10360 (2021).