Results - ICDAR 2023 Competition on Structured Text Extraction from Visually-Rich Document Images

method: OpenDoc(single model)2023-10-09

Authors: Huan Chen, Ya Guo, Yi Tu, Jinyang Tang, Chong Zhang, Huijia Zhu

Affiliation: Ant Group

Description: 1. We connect a LayoutMask-0.1b encoder with AntGLM-10b decoder by a linear projection
2. We utilize a union strategy from two ocr results according to iou

Tu, Yi, et al. "LayoutMask: Enhance Text-Layout Interaction in Multi-modal Pre-training for Document Understanding." arXiv preprint arXiv:2305.18721 (2023).

Du, Zhengxiao, et al. "Glm: General language model pretraining with autoregressive blank infilling." arXiv preprint arXiv:2103.10360 (2021).

method: sample-12023-03-20

Authors: Zhenrong Zhang, Lei Jiang, Youhui Guo, Jianshu Zhang, Jun Du

Affiliation: University of Science and Technology of China (USTC), iFLYTEK AI Research

Email: zzr666@mail.ustc.edu.cn

Description: 1. We use the UniLM[2] and LiLT[3] as decoder to utilize text and layout information, OCR results with manual-rule sorting are fed into decoder to predict target.
2. We assemble DocPrompt[1], UniLM[2] and LiLT[3].

[1] https://github.com/PaddlePaddle/PaddleNLP/blob/develop/model_zoo/ernie-layout/README_ch.md [2] https://github.com/microsoft/unilm/blob/master/s2s-ft/

[3] Jiapeng Wang, Lianwen Jin and Kai Ding. LiLT: A Simple yet Effective Language-Independent Layout Transformer for Structured Document Understanding. 2022, ACL.

method: LayoutLMv32023-03-14

Authors: Minhui Wu(伍敏慧),Mei Jiang(姜媚),Chen Li(李琛),Jing Lv(吕静),Huiwen Shi(石惠文)

Affiliation: TencentOCR

Description: Based on a large pretrained model and LayoutLM v3 architecture, with some pre/post processing methods.

Ranking Table

Description Paper Source Code

Date	Method	score	score1	score2
2023-10-09	OpenDoc(single model)	78.98%	82.69%	64.15%
2023-03-20	sample-1	78.71%	82.07%	65.27%
2023-03-14	LayoutLMv3	77.35%	80.01%	66.71%
2023-03-13	LayoutLMv3	76.90%	79.58%	66.20%
2023-03-17	KIE-Brain3	71.44%	74.90%	57.59%
2023-03-17	KIE-Brainer2	71.24%	74.82%	56.92%
2023-03-17	KIE-Brain	71.24%	74.87%	56.69%
2023-03-16	zero-shot-qa	70.75%	74.24%	56.81%
2023-03-15	zero shot qa	68.23%	71.89%	53.60%
2023-03-17	task3-2	62.59%	65.52%	50.85%
2023-03-17	task3_1	56.11%	58.31%	47.33%
2023-03-17	task3_0	47.39%	49.16%	40.29%
2023-03-13	task3_base	43.70%	46.09%	34.15%
2023-03-10	test	2.03%	2.30%	0.97%
2023-03-13	Donut_VIE	1.37%	1.47%	1.01%
2023-03-13	first commit	0.00%	0.00%	0.00%

Inactive evaluations

method: OpenDoc(single model)2023-10-09

method: sample-12023-03-20

method: LayoutLMv32023-03-14

Ranking Table

Ranking Graphic