- Task 1 - Single Page Document VQA
- Task 2 - Document Collection VQA
- Task 3 - Infographics VQA
- Task 4 - MP-DocVQA
method: Human Performance2022-03-02
Authors: DocVQA Organizers
Affiliation: CVC-UAB, IIIT Hyderabad
Email: docvqa@cvc.uab.es
Description: Human performance as reported in InfographicVQA paper
method: qwen2-vl2024-07-12
Authors: qwen team
Affiliation: alibaba group
Description: qwen2-vl
method: InternVL2-Pro (generalist)2024-06-30
Authors: InternVL team
Affiliation: Shanghai AI Laboratory & Sensetime & Tsinghua University
Email: czcz94cz@gmail.com
Description: InternVL Family: Closing the Gap to Commercial Multimodal Models with Open-Source Suites —— A Pioneering Open-Source Alternative to GPT-4V
Demo: https://internvl.opengvlab.com/
Code: https://github.com/OpenGVLab/InternVL
Model: https://huggingface.co/OpenGVLab/InternVL-Chat-V1-5
Description Paper Source Code
Answer type | Evidence | Operation | |||||||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
Date | Method | Score | Image span | Question span | Multiple spans | Non span | Table/List | Textual | Visual object | Figure | Map | Comparison | Arithmetic | Counting | |||
2022-03-02 | Human Performance | 0.9718 | 0.9745 | 0.9777 | 0.9335 | 0.9716 | 0.9780 | 0.9789 | 0.9770 | 0.9699 | 0.9433 | 0.9712 | 0.9837 | 0.9544 | |||
2024-07-12 | qwen2-vl | 0.8469 | 0.8739 | 0.8708 | 0.7778 | 0.7424 | 0.8596 | 0.9430 | 0.7827 | 0.8170 | 0.7592 | 0.7295 | 0.8977 | 0.5793 | |||
2024-06-30 | InternVL2-Pro (generalist) | 0.8334 | 0.8681 | 0.8929 | 0.7350 | 0.6969 | 0.8335 | 0.9260 | 0.7757 | 0.8093 | 0.7186 | 0.7301 | 0.8584 | 0.5368 | |||
2024-09-25 | Molmo-72B | 0.8186 | 0.8513 | 0.8827 | 0.6821 | 0.7041 | 0.8184 | 0.9136 | 0.8062 | 0.7945 | 0.6960 | 0.7054 | 0.8188 | 0.5930 | |||
2024-11-20 | Eagle-2-9B | 0.7805 | 0.8158 | 0.8212 | 0.6386 | 0.6572 | 0.7736 | 0.9019 | 0.7395 | 0.7450 | 0.6317 | 0.6203 | 0.8062 | 0.5055 | |||
2024-04-27 | InternVL-1.5-Plus (generalist) | 0.7574 | 0.7989 | 0.8124 | 0.6425 | 0.5987 | 0.7544 | 0.8733 | 0.7306 | 0.7234 | 0.6216 | 0.6065 | 0.7386 | 0.4623 | |||
2024-01-24 | qwenvl-max (single generalist model) | 0.7341 | 0.7756 | 0.8083 | 0.6035 | 0.5717 | 0.7291 | 0.8856 | 0.6708 | 0.6892 | 0.5967 | 0.6009 | 0.7152 | 0.4388 | |||
2024-05-31 | GPT-4 Vision Turbo + Amazon Textract OCR | 0.7191 | 0.7575 | 0.7795 | 0.6591 | 0.5553 | 0.7183 | 0.8201 | 0.6696 | 0.6904 | 0.6926 | 0.5815 | 0.6759 | 0.4281 | |||
2023-07-05 | RALLM | 0.7175 | 0.7421 | 0.7884 | 0.0830 | 0.8031 | 0.6866 | 0.7088 | 0.7376 | 0.7214 | 0.8049 | 0.7141 | 0.8038 | 0.7916 | |||
2024-11-01 | MLCD-Embodied-7B: Multi-label Cluster Discrimination for Visual Representation Learning | 0.6998 | 0.7330 | 0.7930 | 0.5955 | 0.5564 | 0.6951 | 0.8271 | 0.6654 | 0.6614 | 0.5495 | 0.5523 | 0.6350 | 0.4905 | |||
2024-04-02 | InternLM-XComposer2-4KHD-7B | 0.6855 | 0.7336 | 0.7570 | 0.5151 | 0.5124 | 0.6643 | 0.8240 | 0.6598 | 0.6471 | 0.5241 | 0.5120 | 0.6636 | 0.3610 | |||
2023-11-15 | SMoLA-PaLI-X Specialist Model | 0.6621 | 0.7166 | 0.7252 | 0.5838 | 0.4292 | 0.6448 | 0.8261 | 0.6714 | 0.6110 | 0.5065 | 0.5238 | 0.5054 | 0.3506 | |||
2024-02-10 | ScreenAI 5B | 0.6590 | 0.7162 | 0.7247 | 0.5734 | 0.4140 | 0.6525 | 0.8315 | 0.5968 | 0.6020 | 0.4467 | 0.4815 | 0.5303 | 0.3000 | |||
2023-12-07 | SMoLA-PaLI-X Generalist Model | 0.6556 | 0.7107 | 0.7228 | 0.5642 | 0.4197 | 0.6200 | 0.8237 | 0.6710 | 0.6095 | 0.5246 | 0.5159 | 0.4988 | 0.3372 | |||
2024-09-08 | neetolab-sota-v1 | 0.6195 | 0.6620 | 0.7021 | 0.4814 | 0.4513 | 0.6015 | 0.7652 | 0.5505 | 0.5776 | 0.4996 | 0.4676 | 0.5528 | 0.3491 | |||
2021-04-11 | Applica.ai TILT | 0.6120 | 0.6765 | 0.6419 | 0.4391 | 0.3832 | 0.5917 | 0.7916 | 0.4545 | 0.5654 | 0.4480 | 0.4801 | 0.4958 | 0.2652 | |||
2024-07-22 | Snowflake Arctic-TILT 0.8B | 0.5695 | 0.6274 | 0.6074 | 0.4123 | 0.3653 | 0.5478 | 0.7530 | 0.4204 | 0.5109 | 0.4410 | 0.4350 | 0.5042 | 0.2238 | |||
2023-08-20 | PaLI-X (Google Research, Single Generative Model) | 0.5477 | 0.5940 | 0.6950 | 0.4122 | 0.3534 | 0.5145 | 0.6891 | 0.6373 | 0.5040 | 0.4013 | 0.4290 | 0.4053 | 0.3091 | |||
2024-09-03 | tiancaili | 0.4992 | 0.5293 | 0.7171 | 0.3317 | 0.3549 | 0.4633 | 0.6105 | 0.5544 | 0.4680 | 0.4139 | 0.4013 | 0.4113 | 0.2959 | |||
2024-05-21 | PaliGemma-3B (finetune, 896px) | 0.4775 | 0.5214 | 0.5372 | 0.3301 | 0.3220 | 0.4500 | 0.6057 | 0.4252 | 0.4377 | 0.3690 | 0.3742 | 0.3924 | 0.2507 | |||
2024-07-26 | loixc-vqa | 0.4715 | 0.5000 | 0.6815 | 0.3250 | 0.3309 | 0.4521 | 0.5853 | 0.4108 | 0.4364 | 0.3612 | 0.4006 | 0.3919 | 0.2505 | |||
2024-10-09 | llama3-qwenvit | 0.4329 | 0.5077 | 0.5162 | 0.2329 | 0.1650 | 0.4207 | 0.5568 | 0.4785 | 0.4053 | 0.3014 | 0.3371 | 0.1311 | 0.2118 | |||
2023-10-09 | nnrc_udop_224 | 0.4299 | 0.4716 | 0.5279 | 0.2410 | 0.2785 | 0.3740 | 0.5755 | 0.3475 | 0.3944 | 0.3347 | 0.2997 | 0.3583 | 0.1866 | |||
2024-05-21 | PaliGemma-3B (finetune, 448px) | 0.4047 | 0.4275 | 0.5801 | 0.2560 | 0.3007 | 0.4010 | 0.4853 | 0.3898 | 0.3742 | 0.3178 | 0.3530 | 0.3360 | 0.2517 | |||
2022-09-18 | pix2struct-large | 0.4001 | 0.4308 | 0.4839 | 0.2059 | 0.3173 | 0.3833 | 0.5256 | 0.2572 | 0.3726 | 0.3283 | 0.2762 | 0.4198 | 0.2017 | |||
2024-07-31 | tixc-vqa | 0.3975 | 0.4264 | 0.6092 | 0.2620 | 0.2496 | 0.3693 | 0.4798 | 0.3826 | 0.3704 | 0.3172 | 0.3571 | 0.2927 | 0.1965 | |||
2021-04-09 | IG-BERT (single model) | 0.3854 | 0.4181 | 0.4481 | 0.2197 | 0.2849 | 0.3373 | 0.5016 | 0.3013 | 0.3706 | 0.3347 | 0.2939 | 0.3564 | 0.2000 | |||
2022-09-18 | pix2struct-base | 0.3820 | 0.4145 | 0.4381 | 0.1655 | 0.3014 | 0.3351 | 0.4971 | 0.2380 | 0.3632 | 0.3257 | 0.2344 | 0.4036 | 0.1888 | |||
2024-10-09 | llama3-internvit | 0.3749 | 0.4294 | 0.5715 | 0.1641 | 0.1627 | 0.3721 | 0.4580 | 0.4741 | 0.3385 | 0.2350 | 0.3329 | 0.1114 | 0.2109 | |||
2024-04-23 | dolma_multifinetuning | 0.3633 | 0.3832 | 0.5660 | 0.2045 | 0.2657 | 0.3284 | 0.4570 | 0.4042 | 0.3329 | 0.2174 | 0.3117 | 0.2731 | 0.2491 | |||
2021-04-11 | NAVER CLOVA | 0.3219 | 0.3996 | 0.2317 | 0.1064 | 0.1068 | 0.2653 | 0.4488 | 0.1878 | 0.3095 | 0.3231 | 0.2020 | 0.1480 | 0.0695 | |||
2021-04-10 | Ensemble LM and VLM | 0.2853 | 0.3337 | 0.4181 | 0.0748 | 0.1169 | 0.2439 | 0.3649 | 0.2331 | 0.2645 | 0.2845 | 0.2580 | 0.1628 | 0.0647 | |||
2024-05-21 | PaliGemma-3B (finetune, 224px) | 0.2846 | 0.2888 | 0.5024 | 0.1567 | 0.2425 | 0.2675 | 0.3206 | 0.3164 | 0.2609 | 0.2406 | 0.2979 | 0.2025 | 0.2730 | |||
2021-11-09 | LayoutLMv2 LARGE | 0.2829 | 0.3430 | 0.2763 | 0.0641 | 0.1114 | 0.2449 | 0.3855 | 0.1440 | 0.2601 | 0.3110 | 0.1897 | 0.1130 | 0.1158 | |||
2022-09-20 | BROS_BASE (WebViCoB 1M) | 0.2809 | 0.3436 | 0.2485 | 0.0277 | 0.1303 | 0.2545 | 0.3620 | 0.1318 | 0.2767 | 0.2886 | 0.2207 | 0.1745 | 0.0854 | |||
2022-03-03 | InfographicVQA paper model | 0.2720 | 0.3278 | 0.2386 | 0.0450 | 0.1371 | 0.2400 | 0.3626 | 0.1705 | 0.2551 | 0.2205 | 0.1836 | 0.1559 | 0.1140 | |||
2021-04-05 | BERT fuzzy search | 0.2078 | 0.2625 | 0.2333 | 0.0739 | 0.0259 | 0.1852 | 0.2995 | 0.0896 | 0.1942 | 0.1709 | 0.1805 | 0.0160 | 0.0436 | |||
2021-04-10 | BERT | 0.1678 | 0.2149 | 0.2117 | 0.0126 | 0.0152 | 0.1479 | 0.2450 | 0.1054 | 0.1505 | 0.1768 | 0.1578 | 0.0158 | 0.0185 | |||
2024-07-13 | 0710 | 0.1407 | 0.1449 | 0.2181 | 0.0674 | 0.1252 | 0.1294 | 0.1612 | 0.1334 | 0.1368 | 0.1041 | 0.1261 | 0.1397 | 0.1072 |