Results - Document Information Localization and Extraction

Authors: Yan Wang, Jiefeng Ma, Zhenrong Zhang, Pengfei Hu, Jianshu Zhang, Jun Du

Affiliation: University of Science and Technology of China (USTC), iFLYTEK AI Research

Description: We pre-trained several GraphDoc models on provided unlabelled documents under different configurations. We then fine-tuned the models on the training set for 200-500 epochs. After classifying OCR boxes into various categories, we proposed a Merger module to handle the aggregation process.
We also used some pre/post-processing according to the text content and distances between OCR boxes. Finally, we adopted model ensembling to further enhance the system performance.

Multimodal Pre-training Based on Graph Attention Network for Document Understanding

Source code

method: Baseline+Ensemble+Pseudo+Post-Processing2023-05-16

Authors: UIT@AICLUB_TAB

Affiliation: UIT - University of Information Technology - VNUHCM

Email: 22520121@gm.uit.edu.vn

Description: Our approach is based on the checkpoint baseline with some improvements. We trained/used models:
1. Model RoBERTa base from scratch using FGM and Lion Optimizer with synthetic data for 30 epochs, after that, I trained on annotated data.
2. Model RoBERTa ours (checkpoint) with Lion Optimizer
3. Model RoBERTa base (checkpoint)

After that, we ensemble them by unioning words that are marked at 1 of 55 field type, post-processing.
After that, we used the ensembled model to predict unlabeled data, we have pseudo data, use them to pre-train 3 models, and train on annotated data after that.

Pipeline: https://ibb.co/4MWcXgb

Source code

method: baseline - RoBERTa-base2023-05-02

Authors: Organizers

Affiliation: Rossum.ai, Czech Technical University in Prague, University of La Rochelle

Description: Baseline method. Uses multi-label NER formulation with RoBERTa base as the backbone.

Šimsa, Štěpán, et al. "DocILE Benchmark for Document Information Localization and Extraction." arXiv preprint arXiv:2302.05658 (2023).

Source code

Ranking Table

Description Paper Source Code

Date	Method	AP	F1	Precision	Recall
2023-05-24	GraphDoc+Classify+Merge	48.49%	57.65%	56.72%	58.61%
2023-05-16	Baseline+Ensemble+Pseudo+Post-Processing	44.07%	47.58%	41.05%	56.59%
2023-05-02	baseline - RoBERTa-base	39.43%	52.42%	50.35%	54.65%
2023-05-08	YOLOv8X+Grid	39.34%	52.92%	55.17%	50.84%
2023-05-02	baseline - RoBERTa-base with synthetic pre-training	38.42%	51.25%	49.22%	53.44%
2023-05-02	baseline - LayoutLMv3 with unsupervised and synthetic pre-training	33.84%	48.49%	49.60%	47.44%
2023-05-02	baseline - LayoutLMv3 with unsupervised pre-training	32.31%	45.01%	44.05%	46.01%
2023-05-25	SRCB Submission on Key Information Localization and Extraction	32.09%	54.61%	53.68%	55.57%

Inactive evaluations

method: GraphDoc+Classify+Merge2023-05-24

method: Baseline+Ensemble+Pseudo+Post-Processing2023-05-16

method: baseline - RoBERTa-base2023-05-02

Ranking Table

Ranking Graphic

Ranking Graphic