method: OpenDoc(single model)2023-10-09

Authors: Huan Chen, Ya Guo, Yi Tu, Jinyang Tang, Chong Zhang, Huijia Zhu

Affiliation: Ant Group

Email: chenhuan.chen@antgroup.com

Description: 1. We connect a LayoutMask-0.1b encoder with AntGLM-10b decoder by a linear projection
2. We utilize a union strategy from two ocr results according to iou

Tu, Yi, et al. "LayoutMask: Enhance Text-Layout Interaction in Multi-modal Pre-training for Document Understanding." arXiv preprint arXiv:2305.18721 (2023).

Du, Zhengxiao, et al. "Glm: General language model pretraining with autoregressive blank infilling." arXiv preprint arXiv:2103.10360 (2021).

method: sample-12023-03-20

Authors: Zhenrong Zhang, Lei Jiang, Youhui Guo, Jianshu Zhang, Jun Du

Affiliation: University of Science and Technology of China (USTC), iFLYTEK AI Research

Email: zzr666@mail.ustc.edu.cn

Description: 1. We use the UniLM[2] and LiLT[3] as decoder to utilize text and layout information, OCR results with manual-rule sorting are fed into decoder to predict target.
2. We assemble DocPrompt[1], UniLM[2] and LiLT[3].

[1] https://github.com/PaddlePaddle/PaddleNLP/blob/develop/model_zoo/ernie-layout/README_ch.md [2] https://github.com/microsoft/unilm/blob/master/s2s-ft/

[3] Jiapeng Wang, Lianwen Jin and Kai Ding. LiLT: A Simple yet Effective Language-Independent Layout Transformer for Structured Document Understanding. 2022, ACL.

method: LayoutLMv32023-03-14

Authors: Minhui Wu(伍敏慧),Mei Jiang(姜媚),Chen Li(李琛),Jing Lv(吕静),Huiwen Shi(石惠文)

Affiliation: TencentOCR

Description: Based on a large pretrained model and LayoutLM v3 architecture, with some pre/post processing methods.

Ranking Table

Description Paper Source Code
DateMethodscorescore1score2
2023-10-09OpenDoc(single model)78.98%82.69%64.15%
2023-03-20sample-178.71%82.07%65.27%
2023-03-14LayoutLMv377.35%80.01%66.71%
2023-03-13LayoutLMv376.90%79.58%66.20%
2023-03-17KIE-Brain371.44%74.90%57.59%
2023-03-17KIE-Brainer271.24%74.82%56.92%
2023-03-17KIE-Brain71.24%74.87%56.69%
2023-03-16zero-shot-qa70.75%74.24%56.81%
2023-03-15zero shot qa68.23%71.89%53.60%
2023-03-17task3-262.59%65.52%50.85%
2023-03-17task3_156.11%58.31%47.33%
2023-03-17task3_047.39%49.16%40.29%
2023-03-13task3_base43.70%46.09%34.15%
2023-03-10test2.03%2.30%0.97%
2023-03-13Donut_VIE1.37%1.47%1.01%
2023-03-13first commit0.00%0.00%0.00%

Ranking Graphic