Results - Document UnderstanDing of Everything 😎

Authors: Snowflake AI Research

Affiliation: Snowflake

Description: TILT encoder-decoder model improved with novel modality fusion, attention sparsity, and enhanced training recipe.

Authors: Unofficial

Description: GPT-4 Vision Turbo with 2048px images (longer side) and output of Azure OCR. See the paper for details.

Authors: Tsachi Blau, Sharon Fogel, Roi Ronen, Alona Golts, Roy Ganz, Shahar Tsiper, Elad Ben Avraham, Aviad Aberdam, Ron Litman

Affiliation: AWS AI Labs and Technion Israel

Description: GRAM model based on Docformerv2 trained on DUDE and Multi-Page DocVQA dataset.

		Answer	Calibration		OOD Detection	ANLS per Answer type
Date	Method	ANLS	ECE	AURC	AUROC	Extractive	Abstractive	List of answers	Unanswerable
2024-08-30	Snowflake Arctic-TILT 0.8B	0.5809	0.0763	0.2529	0.5289	0.6271	0.5645	0.4669	0.6261
2024-05-31	GPT-4 Vision Turbo + Azure OCR	0.5392	0.5583	0.4317	0.5000	0.5973	0.5248	0.5785	0.5131
2024-01-16	GRAM	0.5336	0.4404	0.4404	0.5000	0.5683	0.5232	0.1996	0.6543
2024-01-16	GRAM C-Former	0.5097	0.4613	0.4613	0.5000	0.5515	0.5046	0.1726	0.6104
2023-04-20	DocGptVQA	0.5002	0.2240	0.4210	0.8744	0.5186	0.4832	0.2822	0.6204
2025-03-28	qwen-retrival	0.4905	0.4844	0.4844	0.5000	0.6754	0.6404	0.1029	0.0000
2023-04-16	DocBlipVQA	0.4762	0.3065	0.4860	0.7829	0.5069	0.4631	0.3073	0.5522
2023-03-27	model_0327	0.4659	0.1904	0.4398	0.8854	0.5521	0.4660	0.1786	0.4726
2025-04-23	qwen	0.4575	0.5148	0.5148	0.5000	0.6319	0.5940	0.1199	0.0000
2023-03-16	T5-concat	0.3867	0.2489	0.4343	0.5113	0.3727	0.3750	0.1681	0.5289
2023-04-20	Multi-Modal T5 VQA	0.3790	0.5931	0.5931	0.5000	0.4155	0.4024	0.2021	0.3467
2023-04-19	Multi-Modal T5 VQA	0.3789	0.5931	0.5931	0.5000	0.4154	0.4022	0.2031	0.3467
2023-04-18	Hi-VT5-beamsearch	0.3574	0.6104	0.6104	0.5000	0.2831	0.3298	0.1060	0.6290
2023-04-21	Hi-VT5-beamsearch with token type embeddings	0.3559	0.2803	0.4603	0.4876	0.3095	0.3515	0.1176	0.5250
2023-04-26	QAP	0.1159	0.4168	0.9076	0.5014	0.0009	0.0007	0.0000	0.6199