method: DocGptVQA2023-04-20

Authors: RenZhou,QiaolingDeng,XinfengChang,LuyanWang,XiaochenHu,HuiLi, YaqiangWu

Affiliation: Lenovo Research

Description: We integrated the prediction outputs from the UDOP model and Blip2 to enhance our results,and we optimized the image encoder and included page number features to address the challenge of multi-page documents. GPT to generate python-like modular programs.

method: DocBlipVQA2023-04-16

Authors: RenZhou,QiaolingDeng,XinfengChang,LuyanWang,XiaochenHu,HuiLi, YaqiangWu

Affiliation: Lenovo Research

Description: We integrated the prediction outputs from the UDOP model and Blip2 to enhance our results,and we optimized the image encoder and included page number features to address the challenge of multi-page documents.

Ranking Table

Description Paper Source Code
AnswerCalibrationOOD DetectionANLS per Answer type
DateMethodANLSECEAURCAUROCExtractiveAbstractiveList of answersUnanswerable
2024-05-31GPT-4 Vision Turbo + Azure OCR0.53920.55830.43170.50000.59730.52480.57850.5131
2023-04-20DocGptVQA0.50020.22400.42100.87440.51860.48320.28220.6204
2023-04-16DocBlipVQA0.47620.30650.48600.78290.50690.46310.30730.5522
2023-04-20Multi-Modal T5 VQA0.37900.59310.59310.50000.41550.40240.20210.3467
2023-04-19Multi-Modal T5 VQA0.37890.59310.59310.50000.41540.40220.20310.3467
2023-04-18Hi-VT5-beamsearch0.35740.61040.61040.50000.28310.32980.10600.6290
2023-04-21Hi-VT5-beamsearch with token type embeddings0.35590.28030.46030.48760.30950.35150.11760.5250

Ranking Graphic

Ranking Graphic

Ranking Graphic

Ranking Graphic