method: Snowflake Arctic-TILT 0.8B2024-08-30
Authors: Snowflake AI Research
Affiliation: Snowflake
Description: TILT encoder-decoder model improved with novel modality fusion, attention sparsity, and enhanced training recipe.
method: GPT-4 Vision Turbo + Azure OCR2024-05-31
Authors: Unofficial
Description: GPT-4 Vision Turbo with 2048px images (longer side) and output of Azure OCR. See the paper for details.
method: GRAM2024-01-16
Authors: Tsachi Blau, Sharon Fogel, Roi Ronen, Alona Golts, Roy Ganz, Shahar Tsiper, Elad Ben Avraham, Aviad Aberdam, Ron Litman
Affiliation: AWS AI Labs and Technion Israel
Description: GRAM model based on Docformerv2 trained on DUDE and Multi-Page DocVQA dataset.
Description Paper Source Code
Answer | Calibration | OOD Detection | ANLS per Answer type | |||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|
Date | Method | ANLS | ECE | AURC | AUROC | Extractive | Abstractive | List of answers | Unanswerable | |||
2024-08-30 | Snowflake Arctic-TILT 0.8B | 0.5809 | 0.0763 | 0.2529 | 0.5289 | 0.6271 | 0.5645 | 0.4669 | 0.6261 | |||
2024-05-31 | GPT-4 Vision Turbo + Azure OCR | 0.5392 | 0.5583 | 0.4317 | 0.5000 | 0.5973 | 0.5248 | 0.5785 | 0.5131 | |||
2024-01-16 | GRAM | 0.5336 | 0.4404 | 0.4404 | 0.5000 | 0.5683 | 0.5232 | 0.1996 | 0.6543 | |||
2024-01-16 | GRAM C-Former | 0.5097 | 0.4613 | 0.4613 | 0.5000 | 0.5515 | 0.5046 | 0.1726 | 0.6104 | |||
2023-04-20 | DocGptVQA | 0.5002 | 0.2240 | 0.4210 | 0.8744 | 0.5186 | 0.4832 | 0.2822 | 0.6204 | |||
2023-04-16 | DocBlipVQA | 0.4762 | 0.3065 | 0.4860 | 0.7829 | 0.5069 | 0.4631 | 0.3073 | 0.5522 | |||
2023-03-27 | model_0327 | 0.4659 | 0.1904 | 0.4398 | 0.8854 | 0.5521 | 0.4660 | 0.1786 | 0.4726 | |||
2023-03-16 | T5-concat | 0.3867 | 0.2489 | 0.4343 | 0.5113 | 0.3727 | 0.3750 | 0.1681 | 0.5289 | |||
2023-04-20 | Multi-Modal T5 VQA | 0.3790 | 0.5931 | 0.5931 | 0.5000 | 0.4155 | 0.4024 | 0.2021 | 0.3467 | |||
2023-04-19 | Multi-Modal T5 VQA | 0.3789 | 0.5931 | 0.5931 | 0.5000 | 0.4154 | 0.4022 | 0.2031 | 0.3467 | |||
2023-04-18 | Hi-VT5-beamsearch | 0.3574 | 0.6104 | 0.6104 | 0.5000 | 0.2831 | 0.3298 | 0.1060 | 0.6290 | |||
2023-04-21 | Hi-VT5-beamsearch with token type embeddings | 0.3559 | 0.2803 | 0.4603 | 0.4876 | 0.3095 | 0.3515 | 0.1176 | 0.5250 | |||
2023-04-26 | QAP | 0.1159 | 0.4168 | 0.9076 | 0.5014 | 0.0009 | 0.0007 | 0.0000 | 0.6199 |