- Task 3 - E2E Zero-shot Structured Text Extraction - Method: OpenDoc(single model)
- Method info
- Samples list
- Per sample details
method: OpenDoc(single model)2023-10-09
Authors: Huan Chen, Ya Guo, Yi Tu, Jinyang Tang, Chong Zhang, Huijia Zhu
Affiliation: Ant Group
Email: chenhuan.chen@antgroup.com
Description: 1. We connect a LayoutMask-0.1b encoder with AntGLM-10b decoder by a linear projection
2. We utilize a union strategy from two ocr results according to iou
Tu, Yi, et al. "LayoutMask: Enhance Text-Layout Interaction in Multi-modal Pre-training for Document Understanding." arXiv preprint arXiv:2305.18721 (2023).
Du, Zhengxiao, et al. "Glm: General language model pretraining with autoregressive blank infilling." arXiv preprint arXiv:2103.10360 (2021).