Results - Document UnderstanDing of Everything 😎

Task 1 - DUDE

method: GPT-4 Vision Turbo + Azure OCR2024-05-31

Authors: Unofficial

Description: GPT-4 Vision Turbo with 2048px images (longer side) and output of Azure OCR. See the paper for details.

@misc{borchmann2024notes, title={Notes on Applicability of GPT-4 to Document Understanding}, author={Łukasz Borchmann}, year={2024}, eprint={2405.18433}, archivePrefix={arXiv}, primaryClass={cs.CL} }

method: DocGptVQA2023-04-20

Authors: RenZhou,QiaolingDeng,XinfengChang,LuyanWang,XiaochenHu,HuiLi, YaqiangWu

Affiliation: Lenovo Research

Description: We integrated the prediction outputs from the UDOP model and Blip2 to enhance our results,and we optimized the image encoder and included page number features to address the challenge of multi-page documents. GPT to generate python-like modular programs.

method: DocBlipVQA2023-04-16

Authors: RenZhou,QiaolingDeng,XinfengChang,LuyanWang,XiaochenHu,HuiLi, YaqiangWu

Affiliation: Lenovo Research

Ranking Table

Description Paper Source Code

		Answer	Calibration		OOD Detection	ANLS per Answer type
Date	Method	ANLS	ECE	AURC	AUROC	Extractive	Abstractive	List of answers	Unanswerable
2024-05-31	GPT-4 Vision Turbo + Azure OCR	0.5392	0.5583	0.4317	0.5000	0.5973	0.5248	0.5785	0.5131
2023-04-20	DocGptVQA	0.5002	0.2240	0.4210	0.8744	0.5186	0.4832	0.2822	0.6204
2023-04-16	DocBlipVQA	0.4762	0.3065	0.4860	0.7829	0.5069	0.4631	0.3073	0.5522
2023-04-20	Multi-Modal T5 VQA	0.3790	0.5931	0.5931	0.5000	0.4155	0.4024	0.2021	0.3467
2023-04-19	Multi-Modal T5 VQA	0.3789	0.5931	0.5931	0.5000	0.4154	0.4022	0.2031	0.3467
2023-04-18	Hi-VT5-beamsearch	0.3574	0.6104	0.6104	0.5000	0.2831	0.3298	0.1060	0.6290
2023-04-21	Hi-VT5-beamsearch with token type embeddings	0.3559	0.2803	0.4603	0.4876	0.3095	0.3515	0.1176	0.5250

Inactive evaluations

method: GPT-4 Vision Turbo + Azure OCR2024-05-31

method: DocGptVQA2023-04-20

method: DocBlipVQA2023-04-16

Ranking Table

Ranking Graphic

Ranking Graphic

Ranking Graphic

Ranking Graphic