- Task 1 - Single Page Document VQA
- Task 2 - Document Collection VQA
- Task 3 - Infographics VQA
- Task 4 - MP-DocVQA
method: Human Performance2020-06-13
Authors: DocVQA Organizers
Affiliation: CVIT, IIIT Hyderabad, CVC-UAB, Amazon
Description: Human performance on the test set.
A small group of volunteers were asked to enter an answer for the given question and the image.
method: SMoLA-PaLI-X Specialist Model2023-11-15
Authors: SMoLA PaLI Team
Affiliation: Google Research
Description: Omni-SMoLA uses the Soft MoE approach to (softly) mix many multimodal low rank experts. The specialist model is further lora tuned on the DocVQA task from the SMoLA-PaLI-X generalist model.
method: qwenvl2023-11-28
Authors: qwen team
Affiliation: alibaba group
Description: Qwen-vl: A versatile vision-language model for understanding, localization, text reading, and beyond
Just a single model! Give it a go with our model at https://tongyi.aliyun.com/qianwen
Description Paper Source Code
Date | Method | Score | Figure/Diagram | Form | Table/List | Layout | Free_text | Image/Photo | Handwritten | Yes/No | Others | |||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
2020-06-13 | Human Performance | 0.9811 | 0.9756 | 0.9825 | 0.9780 | 0.9845 | 0.9839 | 0.9740 | 0.9717 | 0.9974 | 0.9828 | |||
2023-11-15 | SMoLA-PaLI-X Specialist Model | 0.9084 | 0.7790 | 0.9416 | 0.8934 | 0.9262 | 0.9188 | 0.7911 | 0.8508 | 0.8966 | 0.8456 | |||
2023-11-28 | qwenvl | 0.9024 | 0.7947 | 0.9321 | 0.8848 | 0.9215 | 0.9106 | 0.7925 | 0.8524 | 0.7241 | 0.8795 | |||
2022-10-08 | BAIDU-DI | 0.9016 | 0.6823 | 0.9186 | 0.9139 | 0.9138 | 0.9234 | 0.6841 | 0.7949 | 0.6181 | 0.8344 | |||
2022-03-31 | Tencent Youtu | 0.8866 | 0.7576 | 0.9470 | 0.8932 | 0.8821 | 0.8654 | 0.6680 | 0.8877 | 0.4828 | 0.8413 | |||
2022-01-13 | ERNIE-Layout 2.0 | 0.8841 | 0.6434 | 0.9177 | 0.8996 | 0.8899 | 0.9010 | 0.6223 | 0.7836 | 0.6124 | 0.8118 | |||
2023-07-07 | DocFormerv2 (Single Model with 750M Parameters) | 0.8784 | 0.6680 | 0.9382 | 0.9076 | 0.8676 | 0.8555 | 0.5840 | 0.8123 | 0.8276 | 0.8070 | |||
2021-11-26 | Mybank-DocReader | 0.8755 | 0.6682 | 0.9233 | 0.8763 | 0.8896 | 0.8713 | 0.6290 | 0.8047 | 0.5805 | 0.7804 | |||
2021-09-06 | ERNIE-Layout 1.0 | 0.8753 | 0.6586 | 0.8972 | 0.8864 | 0.8902 | 0.8943 | 0.6392 | 0.7331 | 0.5434 | 0.8115 | |||
2021-02-12 | Applica.ai TILT | 0.8705 | 0.6082 | 0.9459 | 0.8980 | 0.8592 | 0.8581 | 0.5508 | 0.8139 | 0.6897 | 0.7788 | |||
2023-05-31 | PaLI-X (Google Research; Single Generative Model) | 0.8679 | 0.6971 | 0.8992 | 0.8400 | 0.8955 | 0.8925 | 0.7589 | 0.7209 | 0.8966 | 0.8468 | |||
2020-12-22 | LayoutLM 2.0 (single model) | 0.8672 | 0.6574 | 0.8953 | 0.8769 | 0.8791 | 0.8707 | 0.7287 | 0.6729 | 0.5517 | 0.8103 | |||
2020-08-16 | Alibaba DAMO NLP | 0.8506 | 0.6650 | 0.8809 | 0.8552 | 0.8733 | 0.8397 | 0.6758 | 0.7691 | 0.5492 | 0.7526 | |||
2020-05-16 | PingAn-OneConnect-Gammalab-DQA | 0.8484 | 0.6059 | 0.9021 | 0.8463 | 0.8730 | 0.8337 | 0.5812 | 0.7692 | 0.5172 | 0.7289 | |||
2023-02-21 | LayoutLMv2_star_seg_large | 0.8430 | 0.7008 | 0.8737 | 0.8389 | 0.8536 | 0.8498 | 0.6872 | 0.7823 | 0.6181 | 0.8252 | |||
2023-06-30 | LATIN-Prompt + Claude (Zero shot) | 0.8336 | 0.6601 | 0.8553 | 0.8584 | 0.8169 | 0.8726 | 0.6021 | 0.6774 | 0.7126 | 0.8258 | |||
2023-12-01 | nnrc mplugowl2_9k | 0.8281 | 0.5780 | 0.8949 | 0.7860 | 0.8662 | 0.8631 | 0.6302 | 0.8054 | 0.5517 | 0.7867 | |||
2023-11-27 | 36_nnrc_llama2 | 0.8239 | 0.5404 | 0.8787 | 0.7958 | 0.8475 | 0.8813 | 0.5995 | 0.7991 | 0.6897 | 0.7922 | |||
2023-10-09 | nnrc_udop_224_grad_acc16 | 0.8039 | 0.5469 | 0.8731 | 0.8016 | 0.8187 | 0.7989 | 0.5215 | 0.6783 | 0.5862 | 0.6811 | |||
2023-05-06 | Docugami-Layout | 0.8031 | 0.5176 | 0.8875 | 0.7902 | 0.8214 | 0.8026 | 0.5089 | 0.7753 | 0.4224 | 0.7022 | |||
2023-09-27 | nnrc_udop_224 | 0.8023 | 0.5628 | 0.8704 | 0.8019 | 0.8163 | 0.7981 | 0.5356 | 0.6731 | 0.5862 | 0.6861 | |||
2022-01-07 | LayoutLMV2-large on Textract | 0.7873 | 0.4924 | 0.8771 | 0.8218 | 0.7726 | 0.7661 | 0.4820 | 0.7276 | 0.3793 | 0.6983 | |||
2023-01-29 | LayoutLMv2_star_seg | 0.7859 | 0.5328 | 0.8406 | 0.7859 | 0.8128 | 0.7909 | 0.4879 | 0.6468 | 0.3644 | 0.6953 | |||
2023-05-25 | YoBerDaV2 Single-page | 0.7749 | 0.4737 | 0.8894 | 0.7586 | 0.7962 | 0.7398 | 0.4763 | 0.7173 | 0.7586 | 0.6976 | |||
2020-05-14 | Structural LM-v2 | 0.7674 | 0.4931 | 0.8381 | 0.7621 | 0.7924 | 0.7596 | 0.4756 | 0.6282 | 0.5517 | 0.6549 | |||
2022-09-18 | pix2struct-large | 0.7656 | 0.4424 | 0.8827 | 0.7702 | 0.7774 | 0.7085 | 0.5383 | 0.6320 | 0.7586 | 0.6536 | |||
2022-12-28 | Submission_ErnieLayout_base_finetuned_on_DocVQA_en_train_dev_textract_word_segments_ck-14000 | 0.7599 | 0.4313 | 0.8678 | 0.7726 | 0.7641 | 0.7330 | 0.4598 | 0.6957 | 0.4828 | 0.6097 | |||
2020-05-15 | QA_Base_MRC_2 | 0.7415 | 0.4854 | 0.8015 | 0.6738 | 0.7943 | 0.8136 | 0.5740 | 0.5831 | 0.5287 | 0.7161 | |||
2020-05-15 | QA_Base_MRC_1 | 0.7407 | 0.4890 | 0.7984 | 0.6675 | 0.7936 | 0.8131 | 0.5854 | 0.6099 | 0.4943 | 0.7384 | |||
2020-05-15 | QA_Base_MRC_4 | 0.7348 | 0.4735 | 0.8040 | 0.6647 | 0.7838 | 0.8043 | 0.5618 | 0.5810 | 0.4598 | 0.7332 | |||
2020-05-15 | QA_Base_MRC_3 | 0.7322 | 0.4852 | 0.7958 | 0.6562 | 0.7842 | 0.8044 | 0.5679 | 0.5730 | 0.4511 | 0.7171 | |||
2020-05-15 | QA_Base_MRC_5 | 0.7274 | 0.4858 | 0.7877 | 0.6550 | 0.7754 | 0.8047 | 0.5405 | 0.5619 | 0.4598 | 0.7084 | |||
2022-09-18 | pix2struct-base | 0.7213 | 0.4111 | 0.8386 | 0.7253 | 0.7503 | 0.6407 | 0.4211 | 0.5753 | 0.6552 | 0.5822 | |||
2023-01-27 | LayoutLM-base+GNN | 0.6984 | 0.4747 | 0.7973 | 0.6848 | 0.7322 | 0.6323 | 0.4398 | 0.5599 | 0.5431 | 0.5388 | |||
2021-12-05 | Electra Large Squad | 0.6961 | 0.4485 | 0.7703 | 0.6348 | 0.7364 | 0.7644 | 0.4594 | 0.5438 | 0.5172 | 0.6470 | |||
2023-05-25 | YoBerDaV1 Multi-page | 0.6904 | 0.3481 | 0.8335 | 0.6411 | 0.7253 | 0.6854 | 0.4191 | 0.6299 | 0.5517 | 0.6129 | |||
2020-05-16 | HyperDQA_V4 | 0.6893 | 0.3874 | 0.7792 | 0.6309 | 0.7478 | 0.7187 | 0.4867 | 0.5630 | 0.4138 | 0.5685 | |||
2020-05-16 | HyperDQA_V3 | 0.6769 | 0.3876 | 0.7774 | 0.6167 | 0.7332 | 0.6961 | 0.4296 | 0.5373 | 0.4138 | 0.5650 | |||
2023-07-06 | GPT3.5 | 0.6759 | 0.4741 | 0.7144 | 0.6524 | 0.7036 | 0.6858 | 0.5385 | 0.5038 | 0.5954 | 0.6660 | |||
2020-05-16 | HyperDQA_V2 | 0.6734 | 0.3818 | 0.7666 | 0.6110 | 0.7332 | 0.6867 | 0.4834 | 0.5560 | 0.3793 | 0.5902 | |||
2020-05-09 | HyperDQA_V1 | 0.6717 | 0.4013 | 0.7693 | 0.6197 | 0.7167 | 0.6922 | 0.3598 | 0.5596 | 0.4138 | 0.5504 | |||
2023-08-15 | LATIN-Tuning-Prompt + Alpaca (Zero-shot) | 0.6687 | 0.3732 | 0.7529 | 0.6545 | 0.6615 | 0.7463 | 0.5439 | 0.4941 | 0.3481 | 0.6831 | |||
2023-07-14 | donut_base | 0.6590 | 0.3960 | 0.8407 | 0.6604 | 0.6987 | 0.4630 | 0.2969 | 0.6964 | 0.0345 | 0.5057 | |||
2022-09-22 | BROS_BASE (WebViCoB 6.4M) | 0.6563 | 0.3780 | 0.7757 | 0.6681 | 0.6557 | 0.6175 | 0.3497 | 0.5782 | 0.4224 | 0.5754 | |||
2023-09-24 | Layoutlm_DocVQA+Token_v2 | 0.6562 | 0.3935 | 0.7764 | 0.6228 | 0.6737 | 0.6711 | 0.3385 | 0.5109 | 0.5086 | 0.5515 | |||
2023-07-21 | donut_half_input_imageSize | 0.6536 | 0.3930 | 0.8366 | 0.6548 | 0.6950 | 0.4609 | 0.2486 | 0.6940 | 0.0345 | 0.4941 | |||
2021-12-04 | Bert Large | 0.6447 | 0.3502 | 0.7535 | 0.5488 | 0.6920 | 0.7266 | 0.4171 | 0.5254 | 0.5517 | 0.6076 | |||
2022-05-23 | Dessurt | 0.6322 | 0.3164 | 0.8058 | 0.6486 | 0.6520 | 0.4852 | 0.2862 | 0.5830 | 0.3793 | 0.4365 | |||
2023-11-09 | DocLLM-1B | 0.5924 | 0.3755 | 0.7582 | 0.5057 | 0.6499 | 0.5338 | 0.3647 | 0.5327 | 0.5724 | 0.4814 | |||
2020-05-09 | bert fulldata fintuned | 0.5900 | 0.4169 | 0.6870 | 0.4269 | 0.6710 | 0.7315 | 0.5124 | 0.4900 | 0.4483 | 0.5907 | |||
2020-05-01 | bert finetuned | 0.5872 | 0.2986 | 0.7011 | 0.4849 | 0.6359 | 0.6933 | 0.4622 | 0.4751 | 0.4483 | 0.4895 | |||
2020-04-30 | HyperDQA_V0 | 0.5715 | 0.3131 | 0.6780 | 0.4732 | 0.6630 | 0.5716 | 0.3623 | 0.4351 | 0.3793 | 0.4941 | |||
2023-09-26 | LayoutLM_Docvqa+Token_v0 | 0.4980 | 0.2319 | 0.6035 | 0.4320 | 0.5684 | 0.4779 | 0.2768 | 0.3081 | 0.1293 | 0.4178 | |||
2022-04-27 | LayoutLMv2, Tesseract OCR eval (dataset OCR trained) | 0.4961 | 0.2544 | 0.5523 | 0.4177 | 0.5495 | 0.5914 | 0.2888 | 0.1361 | 0.2069 | 0.4187 | |||
2022-03-29 | LayoutLMv2, Tesseract OCR eval (Tesseract OCR trained) | 0.4815 | 0.2253 | 0.5440 | 0.4216 | 0.5207 | 0.5709 | 0.2430 | 0.1353 | 0.3103 | 0.3859 | |||
2023-07-26 | donut_large_encoderSize_finetuned_20_epoch | 0.4673 | 0.2236 | 0.6691 | 0.4581 | 0.5026 | 0.2665 | 0.1356 | 0.4983 | 0.5734 | 0.3430 | |||
2020-04-27 | bert | 0.4557 | 0.2233 | 0.5259 | 0.2633 | 0.5113 | 0.7775 | 0.4859 | 0.3565 | 0.0345 | 0.5778 | |||
2020-05-16 | UGLIFT v0.1 (Clova OCR) | 0.4417 | 0.1766 | 0.5600 | 0.3178 | 0.5340 | 0.4520 | 0.2253 | 0.3573 | 0.4483 | 0.3356 | |||
2022-10-21 | Finetuning LayoutLMv3_Base | 0.3596 | 0.2102 | 0.4498 | 0.3858 | 0.3262 | 0.3496 | 0.1552 | 0.3404 | 0.0345 | 0.2706 | |||
2023-09-19 | testtest | 0.3569 | 0.3018 | 0.3407 | 0.2748 | 0.4693 | 0.3186 | 0.2682 | 0.2753 | 0.6207 | 0.3356 | |||
2020-05-14 | Plain BERT QA | 0.3524 | 0.1687 | 0.4489 | 0.2029 | 0.4321 | 0.4812 | 0.3517 | 0.3096 | 0.0345 | 0.3747 | |||
2020-05-16 | Clova OCR V0 | 0.3489 | 0.0977 | 0.4855 | 0.2670 | 0.3811 | 0.3958 | 0.2489 | 0.2875 | 0.0345 | 0.3062 | |||
2020-05-01 | HDNet | 0.3401 | 0.2040 | 0.4688 | 0.2181 | 0.4710 | 0.1916 | 0.2488 | 0.2736 | 0.1379 | 0.2458 | |||
2020-05-16 | CLOVA OCR | 0.3296 | 0.1246 | 0.4612 | 0.2455 | 0.3622 | 0.3746 | 0.1692 | 0.2736 | 0.0690 | 0.3205 | |||
2023-07-21 | donut_small_encoderSize_finetuned_20_epoch | 0.3157 | 0.1935 | 0.4417 | 0.2912 | 0.3400 | 0.2075 | 0.1495 | 0.2658 | 0.3103 | 0.2644 | |||
2020-04-29 | docVQAQV_V0.1 | 0.3016 | 0.2010 | 0.3898 | 0.3810 | 0.2933 | 0.0664 | 0.1842 | 0.2736 | 0.1586 | 0.1695 | |||
2020-04-26 | docVQAQV_V0 | 0.2342 | 0.1646 | 0.3133 | 0.2623 | 0.2483 | 0.0549 | 0.2277 | 0.1856 | 0.1034 | 0.1635 | |||
2021-02-08 | seq2seq | 0.1081 | 0.0758 | 0.1283 | 0.0829 | 0.1332 | 0.0822 | 0.0786 | 0.0779 | 0.4828 | 0.1052 | |||
2020-06-16 | Test Submission | 0.0000 | 0.0000 | 0.0000 | 0.0000 | 0.0000 | 0.0000 | 0.0000 | 0.0000 | 0.0000 | 0.0000 |