method: GPT-4 Vision Turbo + Azure OCR2024-05-31
Authors: Unofficial
Description: GPT-4 Vision Turbo with 2048px images (longer side) and output of Azure OCR. See the paper for details.
method: DocGptVQA2023-04-20
Authors: RenZhou,QiaolingDeng,XinfengChang,LuyanWang,XiaochenHu,HuiLi, YaqiangWu
Affiliation: Lenovo Research
Description: We integrated the prediction outputs from the UDOP model and Blip2 to enhance our results,and we optimized the image encoder and included page number features to address the challenge of multi-page documents. GPT to generate python-like modular programs.
method: DocBlipVQA2023-04-16
Authors: RenZhou,QiaolingDeng,XinfengChang,LuyanWang,XiaochenHu,HuiLi, YaqiangWu
Affiliation: Lenovo Research
Description: We integrated the prediction outputs from the UDOP model and Blip2 to enhance our results,and we optimized the image encoder and included page number features to address the challenge of multi-page documents.
Answer | Calibration | OOD Detection | ANLS per Answer type | |||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|
Date | Method | ANLS | ECE | AURC | AUROC | Extractive | Abstractive | List of answers | Unanswerable | |||
2024-05-31 | GPT-4 Vision Turbo + Azure OCR | 0.5392 | 0.5583 | 0.4317 | 0.5000 | 0.5973 | 0.5248 | 0.5785 | 0.5131 | |||
2023-04-20 | DocGptVQA | 0.5002 | 0.2240 | 0.4210 | 0.8744 | 0.5186 | 0.4832 | 0.2822 | 0.6204 | |||
2023-04-16 | DocBlipVQA | 0.4762 | 0.3065 | 0.4860 | 0.7829 | 0.5069 | 0.4631 | 0.3073 | 0.5522 | |||
2023-04-20 | Multi-Modal T5 VQA | 0.3790 | 0.5931 | 0.5931 | 0.5000 | 0.4155 | 0.4024 | 0.2021 | 0.3467 | |||
2023-04-19 | Multi-Modal T5 VQA | 0.3789 | 0.5931 | 0.5931 | 0.5000 | 0.4154 | 0.4022 | 0.2031 | 0.3467 | |||
2023-04-18 | Hi-VT5-beamsearch | 0.3574 | 0.6104 | 0.6104 | 0.5000 | 0.2831 | 0.3298 | 0.1060 | 0.6290 | |||
2023-04-21 | Hi-VT5-beamsearch with token type embeddings | 0.3559 | 0.2803 | 0.4603 | 0.4876 | 0.3095 | 0.3515 | 0.1176 | 0.5250 |