- Task 1 - Single Page Document VQA
- Task 2 - Document Collection VQA
- Task 3 - Infographics VQA
- Task 4 - MP-DocVQA
method: Human Performance2022-03-02
Authors: DocVQA Organizers
Affiliation: CVC-UAB, IIIT Hyderabad
Email: docvqa@cvc.uab.es
Description: Human performance as reported in InfographicVQA paper
method: RALLM2023-07-05
Authors: xyy
Description: RALLM
method: Applica.ai TILT2021-04-11
Authors: Applica.ai Research Team
Affiliation: Applica.ai
Email: rafal.powalski@applica.ai, dawid.jurkiewicz@applica.ai
Description: TILT neural network architecture which simultaneously learns layout information, visual features, and textual semantics. Contrary to previous approaches, we rely on a encoder-decoder architecture. Results were obtained from single TILT-Large model pre-trained as described in a paper. Model was finetuned on challenge train set.
Description Paper Source Code
Answer type | Evidence | Operation | |||||||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
Date | Method | Score | Image span | Question span | Multiple spans | Non span | Table/List | Textual | Visual object | Figure | Map | Comparison | Arithmetic | Counting | |||
2022-03-02 | Human Performance | 0.9718 | 0.9745 | 0.9777 | 0.9335 | 0.9716 | 0.9780 | 0.9789 | 0.9770 | 0.9699 | 0.9433 | 0.9712 | 0.9837 | 0.9544 | |||
2023-07-05 | RALLM | 0.7175 | 0.7421 | 0.7884 | 0.0830 | 0.8031 | 0.6866 | 0.7088 | 0.7376 | 0.7214 | 0.8049 | 0.7141 | 0.8038 | 0.7916 | |||
2021-04-11 | Applica.ai TILT | 0.6120 | 0.6765 | 0.6419 | 0.4391 | 0.3832 | 0.5917 | 0.7916 | 0.4545 | 0.5654 | 0.4480 | 0.4801 | 0.4958 | 0.2652 | |||
2023-08-20 | PaLI-X (Google Research, Single Generative Model) | 0.5477 | 0.5940 | 0.6950 | 0.4122 | 0.3534 | 0.5145 | 0.6891 | 0.6373 | 0.5040 | 0.4013 | 0.4290 | 0.4053 | 0.3091 | |||
2023-10-09 | nnrc_udop_224 | 0.4299 | 0.4716 | 0.5279 | 0.2410 | 0.2785 | 0.3740 | 0.5755 | 0.3475 | 0.3944 | 0.3347 | 0.2997 | 0.3583 | 0.1866 | |||
2022-09-18 | pix2struct-large | 0.4001 | 0.4308 | 0.4839 | 0.2059 | 0.3173 | 0.3833 | 0.5256 | 0.2572 | 0.3726 | 0.3283 | 0.2762 | 0.4198 | 0.2017 | |||
2021-04-09 | IG-BERT (single model) | 0.3854 | 0.4181 | 0.4481 | 0.2197 | 0.2849 | 0.3373 | 0.5016 | 0.3013 | 0.3706 | 0.3347 | 0.2939 | 0.3564 | 0.2000 | |||
2022-09-18 | pix2struct-base | 0.3820 | 0.4145 | 0.4381 | 0.1655 | 0.3014 | 0.3351 | 0.4971 | 0.2380 | 0.3632 | 0.3257 | 0.2344 | 0.4036 | 0.1888 | |||
2021-04-11 | NAVER CLOVA | 0.3219 | 0.3997 | 0.2317 | 0.1064 | 0.1068 | 0.2653 | 0.4488 | 0.1878 | 0.3095 | 0.3231 | 0.2020 | 0.1480 | 0.0695 | |||
2021-04-10 | Ensemble LM and VLM | 0.2853 | 0.3337 | 0.4181 | 0.0748 | 0.1169 | 0.2439 | 0.3649 | 0.2331 | 0.2645 | 0.2845 | 0.2580 | 0.1628 | 0.0647 | |||
2021-11-09 | LayoutLMv2 LARGE | 0.2829 | 0.3430 | 0.2763 | 0.0641 | 0.1114 | 0.2449 | 0.3855 | 0.1440 | 0.2601 | 0.3110 | 0.1897 | 0.1130 | 0.1158 | |||
2022-09-20 | BROS_BASE (WebViCoB 1M) | 0.2809 | 0.3436 | 0.2485 | 0.0277 | 0.1303 | 0.2545 | 0.3620 | 0.1318 | 0.2767 | 0.2886 | 0.2207 | 0.1745 | 0.0854 | |||
2022-03-03 | InfographicVQA paper model | 0.2720 | 0.3278 | 0.2386 | 0.0450 | 0.1371 | 0.2400 | 0.3626 | 0.1705 | 0.2551 | 0.2205 | 0.1836 | 0.1559 | 0.1140 | |||
2021-04-05 | BERT fuzzy search | 0.2078 | 0.2625 | 0.2333 | 0.0739 | 0.0259 | 0.1852 | 0.2995 | 0.0896 | 0.1942 | 0.1709 | 0.1805 | 0.0160 | 0.0436 | |||
2021-04-10 | BERT | 0.1678 | 0.2149 | 0.2117 | 0.0126 | 0.0152 | 0.1479 | 0.2450 | 0.1054 | 0.1505 | 0.1768 | 0.1578 | 0.0158 | 0.0185 |