Results - ICDAR 2023 Competition on Structured Text Extraction from Visually-Rich Document Images

method: LayoutLMV3&StrucText2023-03-21

Authors: Minhui Wu(伍敏慧),Mei Jiang(姜媚),Chen Li(李琛),Jing Lv(吕静),Qingxiang Lin(林庆祥),Fan Yang(杨帆)

Affiliation: TencentOCR

Description: Our methods are mainly based on LayoutLMv3 and StrucTextv1 model architecture. All training models are finetuned on large pretrained models of LayoutLM and StrucText. During training and testing, we did some preprocessings to merge and split some badly detected boxes. Since entity label of kv-pair boxes are ignored, we used model trained on task1 images to predict kv relations of text boxes in task2 training/testing images. Thus we added additional 2 classes of labels (question/answer) and mapped original labels to new labels(other -> question/answer) to ease the difficulty of training. Similarly, During testing, we used kv-prediction model to filter those text boxes with kv relations and used model trained on task2 to predict entity label of the lefted boxes. In addition, we combined predicted results of different models based on scores and rules and did some postprocessings to merge texts with same entity label and generated final output.

LayoutLMv3: Pre-training for Document AI with Unified Text and Image Masking

https://arxiv.org/abs/2204.08387

Source code

Source code 2

method: LayoutLM&StrucText2023-03-20

Authors: Minhui Wu(伍敏慧),Mei Jiang(姜媚),Chen Li(李琛),Jing Lv(吕静),Qingxiang Lin(林庆祥),Fan Yang(杨帆)

Affiliation: TencentOCR

LayoutLMv3: Pre-training for Document AI with Unified Text and Image Masking

StrucTexT: Structured Text Understanding with Multi-Modal Transformers

Source code

Source code 2

method: sample-32023-03-21

Authors: Zhenrong Zhang, Lei Jiang, Youhui Guo, Jianshu Zhang, Jun Du

Affiliation: University of Science and Technology of China (USTC), iFLYTEK AI Research

Email: zzr666@mail.ustc.edu.cn

Description: 1. We use the GraphDoc[1] to perform bounding box classification, which utilizes text, layout, and visual information simultaneously.

[1] Zhenrong Zhang, Jiefeng Ma, Jun Du, Licheng Wang and Jianshu Zhang. Multimodal Pre-training Based on Graph Attention Network for Document Understanding. 2022, TMM.

Ranking Table

Description Paper Source Code

Date	Method	Score1	Score2	Score
2023-03-21	LayoutLMV3&StrucText	57.78%	55.32%	57.29%
2023-03-20	LayoutLM&StrucText	55.65%	52.99%	55.12%
2023-03-21	sample-3	47.15%	41.91%	46.10%
2023-03-21	sample-1	46.35%	41.15%	45.31%
2023-03-21	task 1 transfer learning LiLT + task3 transfer learning LiLT + LilLT + Layoutlmv3 ensemble	45.70%	40.20%	44.60%
2023-03-21	LayoutMask-v3	44.79%	42.53%	44.34%
2023-03-21	LayoutMask-v1	44.76%	42.41%	44.29%
2023-03-21	LayoutMask-v2	44.65%	41.87%	44.09%
2023-03-20	Pre-trained model based entity extraction (ro)	44.98%	40.06%	43.99%
2023-03-21	sample-2	44.83%	40.65%	43.99%
2023-03-20	Pre-trained model based entity extraction (roxy)	44.96%	40.06%	43.98%
2023-03-20	Pre-trained model based entity extraction (split_ro)	44.61%	39.49%	43.59%
2023-03-20	Pre-trained model based entity extraction (split_roxy)	44.60%	39.49%	43.58%
2023-03-21	EXO-brain for KIE	44.02%	39.63%	43.14%
2023-03-21	Ex-brain for KIE	44.00%	39.46%	43.09%
2023-03-21	Ex-brain for KIE	44.00%	39.46%	43.09%
2023-03-21	Ex-brain for KIE	43.66%	39.30%	42.79%
2023-03-21	multi-modal based KIE through model fusion	42.42%	37.97%	41.53%
2023-03-20	Aaaa	42.03%	37.14%	41.05%
2023-03-21	multi-modal based KIE through model fusion of different model	41.94%	36.90%	40.93%
2023-03-20	donut	41.64%	37.65%	40.84%
2023-03-20	multi-modal based KIE using LayoutLMv3	41.64%	36.77%	40.67%
2023-03-21	multi-modal based KIE through model fusion	41.28%	37.37%	40.50%
2023-03-20	Ant-FinCV	41.61%	35.98%	40.48%
2023-03-20	result of zhang	41.66%	35.65%	40.46%
2023-03-19	result of zhang	40.63%	39.13%	40.33%
2023-03-21	Ex-brain for KIE	41.38%	35.14%	40.13%
2023-03-17	multi-modal based KIE using LayoutLMv3	40.64%	36.61%	39.83%
2023-03-21	result v5	40.93%	35.03%	39.75%
2023-03-20	result of zhang v4	40.71%	35.39%	39.65%
2023-03-20	result of zhang v3	40.57%	34.79%	39.42%
2023-03-21	KIE-ner-ocrapi	35.47%	42.44%	36.87%
2023-03-20	KIE-ner	35.41%	42.61%	36.85%
2023-03-17	20230317	0.45%	0.68%	0.50%
2023-03-16	test	0.02%	0.00%	0.01%
2023-03-16	ttt	0.02%	0.00%	0.01%
2023-03-20	test	0.02%	0.00%	0.01%
2023-03-20	1	0.02%	0.00%	0.01%

Inactive evaluations

method: LayoutLMV3&StrucText2023-03-21

method: LayoutLM&StrucText2023-03-20

method: sample-32023-03-21

Ranking Table

Ranking Graphic