- Task 1 - Single Page Document VQA
- Task 2 - Document Collection VQA
- Task 3 - Infographics VQA
- Task 4 - MP-DocVQA
method: Snowflake Arctic-TILT 0.8B2024-08-21
Authors: Snowflake Document AI team
Affiliation: Snowflake
Description: Improved Applica.ai TILT model: better text-vision modality fusion, long context support, and a better training procedure. We submitted results from a single model.
method: GRAM2024-01-16
Authors: Tsachi Blau, Sharon Fogel, Roi Ronen, Alona Golts, Roy Ganz, Shahar Tsiper, Elad Ben Avraham, Aviad Aberdam, Ron Litman
Affiliation: AWS AI Labs and Technion Israel
Description: GRAM model based on Docformerv2 trained on Multi-Page DocVQA dataset.
method: GRAM C-Former2024-01-16
Authors: Tsachi Blau, Sharon Fogel, Roi Ronen, Alona Golts, Roy Ganz, Shahar Tsiper, Elad Ben Avraham, Aviad Aberdam, Ron Litman Affiliation: AWS AI Labs and Technion Israel
Affiliation: AWS AI Labs and Technion Israel
Description: GRAM model with C-Former based on Docformerv2 trained on Multi-Page DocVQA dataset.
Answer | Page prediction | ANLS per answer page position | ||||||||||||||||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
Date | Method | ANLS | Accuracy | Page 0 | Page 1 | Page 2 | Page 3 | Page 4 | Page 5 | Page 6 | Page 7 | Page 8 | Page 9 | Page 10 | Page 11 | Page 12 | Page 13 | Page 14 | Page 15 | Page 16 | Page 17 | Page 18 | Page 19 | |||
2024-08-21 | Snowflake Arctic-TILT 0.8B | 0.8122 | 50.7870 | 0.8639 | 0.7967 | 0.7551 | 0.7312 | 0.7105 | 0.7837 | 0.6916 | 0.7239 | 0.7793 | 0.6648 | 0.7817 | 0.6445 | 0.7003 | 0.6393 | 0.7202 | 0.6364 | 0.7355 | 0.5650 | 0.6923 | 1.0000 | |||
2024-01-16 | GRAM | 0.8032 | 19.9841 | 0.8380 | 0.7854 | 0.7528 | 0.7908 | 0.7452 | 0.7922 | 0.7459 | 0.7229 | 0.7464 | 0.7102 | 0.8120 | 0.6905 | 0.7589 | 0.6473 | 0.5714 | 0.5909 | 0.7454 | 0.6367 | 0.8846 | 1.0000 | |||
2024-01-16 | GRAM C-Former | 0.7812 | 19.9841 | 0.8152 | 0.7659 | 0.7363 | 0.7569 | 0.7164 | 0.7238 | 0.7407 | 0.7180 | 0.7587 | 0.8003 | 0.7624 | 0.6771 | 0.7713 | 0.6772 | 0.5798 | 0.6172 | 0.6394 | 0.5664 | 0.8327 | 0.9250 | |||
2024-02-18 | ScreenAI 5B | 0.7711 | 77.8840 | 0.8304 | 0.7394 | 0.7261 | 0.7407 | 0.6100 | 0.7213 | 0.6454 | 0.6389 | 0.6573 | 0.7500 | 0.7262 | 0.7429 | 0.6295 | 0.5147 | 0.5932 | 0.6818 | 0.5383 | 0.5900 | 0.6154 | 0.9605 | |||
2023-10-03 | (OCR-Free) Retrieval-based Baseline | 0.6199 | 81.5501 | 0.6755 | 0.5954 | 0.5802 | 0.5611 | 0.4986 | 0.4989 | 0.5760 | 0.4991 | 0.6062 | 0.6652 | 0.5665 | 0.3438 | 0.4470 | 0.4171 | 0.3713 | 0.5909 | 0.4321 | 0.2575 | 0.7308 | 0.9605 | |||
2023-03-28 | Hi-VT5 | 0.6184 | 79.6374 | 0.6571 | 0.6055 | 0.5907 | 0.5450 | 0.5259 | 0.5431 | 0.6747 | 0.6113 | 0.5971 | 0.7997 | 0.5291 | 0.3694 | 0.5466 | 0.3373 | 0.4144 | 0.3879 | 0.4835 | 0.4001 | 0.6187 | 1.0000 | |||
2023-02-14 | (Baseline) Longformer base concat | 0.5287 | 71.1696 | 0.6293 | 0.4746 | 0.4495 | 0.4371 | 0.3966 | 0.3889 | 0.4451 | 0.3883 | 0.4805 | 0.5049 | 0.2860 | 0.1888 | 0.0861 | 0.1600 | 0.1726 | 0.2448 | 0.1486 | 0.1912 | 0.1154 | 0.6625 | |||
2023-02-14 | (Baseline) T5 base concat | 0.5050 | 0.0000 | 0.7122 | 0.4390 | 0.2567 | 0.2081 | 0.1498 | 0.1533 | 0.2186 | 0.1415 | 0.1301 | 0.3135 | 0.1108 | 0.0829 | 0.0866 | 0.0774 | 0.0873 | 0.0481 | 0.1648 | 0.2240 | 0.0000 | 0.3875 | |||
2023-02-14 | (Baseline) BigBird ITC base concat | 0.4929 | 67.5433 | 0.6506 | 0.4529 | 0.3729 | 0.2883 | 0.1890 | 0.1726 | 0.1681 | 0.1962 | 0.1887 | 0.2957 | 0.1802 | 0.0800 | 0.0829 | 0.0595 | 0.0238 | 0.1993 | 0.0778 | 0.1400 | 0.0769 | 0.2375 | |||
2023-02-14 | (Baseline) LayoutLMv3 base - concat | 0.4538 | 51.9426 | 0.6624 | 0.3962 | 0.2020 | 0.1105 | 0.1609 | 0.0494 | 0.1165 | 0.0467 | 0.0596 | 0.3198 | 0.0980 | 0.0800 | 0.0433 | 0.1131 | 0.0000 | 0.0455 | 0.0978 | 0.1467 | 0.0385 | 0.2105 |