- Task 1 - Single Page Document VQA
- Task 2 - Document Collection VQA
- Task 3 - Infographics VQA
- Task 4 - MP-DocVQA
method: Human Performance2020-06-13
Authors: Task1 Organizers
Affiliation: CVIT, IIIT Hyderabad, CVC-UAB, Amazon
Description: Human performance on the test set.
A small group of volunteers were asked to enter an answer for the given question and the image.
method: BAIDU-DI2022-10-08
Authors: NLP-Document-Intelligence
Affiliation: Baidu
Description: 1. We assemble ERNIE-Layout and DocPrompt ;
2. DocPrompt, a Few-Shot model using multi-stage training based on ERNIE-Layout;
3. DocPrompt demo: https://huggingface.co/spaces/PaddlePaddle/ERNIE-Layout;
4. ERNIE-layout benchmark code: https://github.com/PaddlePaddle/PaddleNLP/tree/develop/model_zoo/ernie-layout
method: Tencent Youtu2022-03-31
Authors: Tencent Youtu Lab
Description: 1. We train a layout model to sort bounding boxes.
2. The text, layout and visual information is utilized by our DocLM, a pretrain model for visually-rich document understanding .
Date | Method | Score | Figure/Diagram | Form | Table/List | Layout | Free_text | Image/Photo | Handwritten | Yes/No | Others | |||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
2020-06-13 | Human Performance | 0.9811 | 0.9756 | 0.9825 | 0.9780 | 0.9845 | 0.9839 | 0.9740 | 0.9717 | 0.9974 | 0.9828 | |||
2022-10-08 | BAIDU-DI | 0.9016 | 0.6823 | 0.9186 | 0.9139 | 0.9138 | 0.9234 | 0.6841 | 0.7949 | 0.6181 | 0.8344 | |||
2022-03-31 | Tencent Youtu | 0.8866 | 0.7576 | 0.9470 | 0.8932 | 0.8821 | 0.8654 | 0.6680 | 0.8877 | 0.4828 | 0.8413 | |||
2022-01-13 | ERNIE-Layout 2.0 | 0.8841 | 0.6434 | 0.9177 | 0.8996 | 0.8899 | 0.9010 | 0.6223 | 0.7836 | 0.6124 | 0.8118 | |||
2021-11-26 | Mybank-DocReader | 0.8755 | 0.6682 | 0.9233 | 0.8763 | 0.8896 | 0.8713 | 0.6290 | 0.8047 | 0.5805 | 0.7804 | |||
2021-09-06 | ERNIE-Layout 1.0 | 0.8753 | 0.6586 | 0.8972 | 0.8864 | 0.8902 | 0.8943 | 0.6392 | 0.7331 | 0.5434 | 0.8115 | |||
2021-02-12 | Applica.ai TILT | 0.8705 | 0.6082 | 0.9459 | 0.8980 | 0.8592 | 0.8581 | 0.5508 | 0.8139 | 0.6897 | 0.7788 | |||
2023-05-31 | PaLI-X (Google Research; Single Generative Model) | 0.8679 | 0.6971 | 0.8992 | 0.8400 | 0.8955 | 0.8925 | 0.7589 | 0.7209 | 0.8966 | 0.8468 | |||
2020-12-22 | LayoutLM 2.0 (single model) | 0.8672 | 0.6574 | 0.8953 | 0.8769 | 0.8791 | 0.8707 | 0.7287 | 0.6729 | 0.5517 | 0.8103 | |||
2020-08-16 | Alibaba DAMO NLP | 0.8506 | 0.6650 | 0.8809 | 0.8552 | 0.8733 | 0.8397 | 0.6758 | 0.7691 | 0.5492 | 0.7526 | |||
2020-05-16 | PingAn-OneConnect-Gammalab-DQA | 0.8484 | 0.6059 | 0.9021 | 0.8463 | 0.8730 | 0.8337 | 0.5812 | 0.7692 | 0.5172 | 0.7289 | |||
2023-02-21 | LayoutLMv2_star_seg_large | 0.8430 | 0.7008 | 0.8737 | 0.8389 | 0.8536 | 0.8498 | 0.6872 | 0.7823 | 0.6181 | 0.8252 | |||
2023-05-06 | Docugami-Layout | 0.8031 | 0.5176 | 0.8875 | 0.7902 | 0.8214 | 0.8026 | 0.5089 | 0.7753 | 0.4224 | 0.7022 | |||
2022-01-07 | LayoutLMV2-large on Textract | 0.7873 | 0.4924 | 0.8771 | 0.8218 | 0.7726 | 0.7661 | 0.4820 | 0.7276 | 0.3793 | 0.6983 | |||
2023-01-29 | LayoutLMv2_star_seg | 0.7859 | 0.5328 | 0.8406 | 0.7859 | 0.8128 | 0.7909 | 0.4879 | 0.6468 | 0.3644 | 0.6953 | |||
2023-05-25 | YoBerDaV2 Single-page | 0.7749 | 0.4737 | 0.8894 | 0.7586 | 0.7962 | 0.7398 | 0.4763 | 0.7173 | 0.7586 | 0.6976 | |||
2020-05-14 | Structural LM-v2 | 0.7674 | 0.4931 | 0.8381 | 0.7621 | 0.7924 | 0.7596 | 0.4756 | 0.6282 | 0.5517 | 0.6549 | |||
2022-09-18 | pix2struct-large | 0.7656 | 0.4424 | 0.8827 | 0.7702 | 0.7774 | 0.7085 | 0.5383 | 0.6320 | 0.7586 | 0.6536 | |||
2022-12-28 | Submission_ErnieLayout_base_finetuned_on_DocVQA_en_train_dev_textract_word_segments_ck-14000 | 0.7599 | 0.4313 | 0.8678 | 0.7726 | 0.7641 | 0.7330 | 0.4598 | 0.6957 | 0.4828 | 0.6097 | |||
2020-05-15 | QA_Base_MRC_2 | 0.7415 | 0.4854 | 0.8015 | 0.6738 | 0.7943 | 0.8136 | 0.5740 | 0.5831 | 0.5287 | 0.7161 | |||
2020-05-15 | QA_Base_MRC_1 | 0.7407 | 0.4890 | 0.7984 | 0.6675 | 0.7936 | 0.8131 | 0.5854 | 0.6099 | 0.4943 | 0.7384 | |||
2020-05-15 | QA_Base_MRC_4 | 0.7348 | 0.4735 | 0.8040 | 0.6647 | 0.7838 | 0.8043 | 0.5618 | 0.5810 | 0.4598 | 0.7332 | |||
2020-05-15 | QA_Base_MRC_3 | 0.7322 | 0.4852 | 0.7958 | 0.6562 | 0.7842 | 0.8044 | 0.5679 | 0.5730 | 0.4511 | 0.7171 | |||
2020-05-15 | QA_Base_MRC_5 | 0.7274 | 0.4858 | 0.7877 | 0.6550 | 0.7754 | 0.8047 | 0.5405 | 0.5619 | 0.4598 | 0.7084 | |||
2022-09-18 | pix2struct-base | 0.7213 | 0.4111 | 0.8386 | 0.7253 | 0.7503 | 0.6407 | 0.4211 | 0.5753 | 0.6552 | 0.5822 | |||
2023-01-27 | LayoutLM-base+GNN | 0.6984 | 0.4747 | 0.7973 | 0.6848 | 0.7322 | 0.6323 | 0.4398 | 0.5599 | 0.5431 | 0.5388 | |||
2021-12-05 | Electra Large Squad | 0.6961 | 0.4485 | 0.7703 | 0.6348 | 0.7364 | 0.7644 | 0.4594 | 0.5438 | 0.5172 | 0.6470 | |||
2023-05-25 | YoBerDaV1 Multi-page | 0.6904 | 0.3481 | 0.8335 | 0.6411 | 0.7253 | 0.6854 | 0.4191 | 0.6299 | 0.5517 | 0.6129 | |||
2020-05-16 | HyperDQA_V4 | 0.6893 | 0.3874 | 0.7792 | 0.6309 | 0.7478 | 0.7187 | 0.4867 | 0.5630 | 0.4138 | 0.5685 | |||
2020-05-16 | HyperDQA_V3 | 0.6769 | 0.3876 | 0.7774 | 0.6167 | 0.7332 | 0.6961 | 0.4296 | 0.5373 | 0.4138 | 0.5650 | |||
2020-05-16 | HyperDQA_V2 | 0.6734 | 0.3818 | 0.7666 | 0.6110 | 0.7332 | 0.6867 | 0.4834 | 0.5560 | 0.3793 | 0.5902 | |||
2020-05-09 | HyperDQA_V1 | 0.6717 | 0.4013 | 0.7693 | 0.6197 | 0.7167 | 0.6922 | 0.3598 | 0.5596 | 0.4138 | 0.5504 | |||
2022-09-22 | BROS_BASE (WebViCoB 6.4M) | 0.6563 | 0.3780 | 0.7757 | 0.6681 | 0.6557 | 0.6175 | 0.3497 | 0.5782 | 0.4224 | 0.5754 | |||
2021-12-04 | Bert Large | 0.6447 | 0.3502 | 0.7535 | 0.5488 | 0.6920 | 0.7266 | 0.4171 | 0.5254 | 0.5517 | 0.6076 | |||
2022-05-23 | Dessurt | 0.6322 | 0.3164 | 0.8058 | 0.6486 | 0.6520 | 0.4852 | 0.2862 | 0.5830 | 0.3793 | 0.4365 | |||
2020-05-09 | bert fulldata fintuned | 0.5900 | 0.4169 | 0.6870 | 0.4269 | 0.6710 | 0.7315 | 0.5124 | 0.4900 | 0.4483 | 0.5907 | |||
2020-05-01 | bert finetuned | 0.5872 | 0.2986 | 0.7011 | 0.4849 | 0.6359 | 0.6933 | 0.4622 | 0.4751 | 0.4483 | 0.4895 | |||
2020-04-30 | HyperDQA_V0 | 0.5715 | 0.3131 | 0.6780 | 0.4732 | 0.6630 | 0.5716 | 0.3623 | 0.4351 | 0.3793 | 0.4941 | |||
2022-04-27 | LayoutLMv2, Tesseract OCR eval (dataset OCR trained) | 0.4961 | 0.2544 | 0.5523 | 0.4177 | 0.5495 | 0.5914 | 0.2888 | 0.1361 | 0.2069 | 0.4187 | |||
2022-03-29 | LayoutLMv2, Tesseract OCR eval (Tesseract OCR trained) | 0.4815 | 0.2253 | 0.5440 | 0.4216 | 0.5207 | 0.5709 | 0.2430 | 0.1353 | 0.3103 | 0.3859 | |||
2020-04-27 | bert | 0.4557 | 0.2233 | 0.5259 | 0.2633 | 0.5113 | 0.7775 | 0.4859 | 0.3565 | 0.0345 | 0.5778 | |||
2020-05-16 | UGLIFT v0.1 (Clova OCR) | 0.4417 | 0.1766 | 0.5600 | 0.3178 | 0.5340 | 0.4520 | 0.2253 | 0.3573 | 0.4483 | 0.3356 | |||
2022-10-21 | Finetuning LayoutLMv3_Base | 0.3596 | 0.2102 | 0.4498 | 0.3858 | 0.3262 | 0.3496 | 0.1552 | 0.3404 | 0.0345 | 0.2706 | |||
2020-05-14 | Plain BERT QA | 0.3524 | 0.1687 | 0.4489 | 0.2029 | 0.4321 | 0.4812 | 0.3517 | 0.3096 | 0.0345 | 0.3747 | |||
2020-05-16 | Clova OCR V0 | 0.3489 | 0.0977 | 0.4855 | 0.2670 | 0.3811 | 0.3958 | 0.2489 | 0.2875 | 0.0345 | 0.3062 | |||
2020-05-01 | HDNet | 0.3401 | 0.2040 | 0.4688 | 0.2181 | 0.4710 | 0.1916 | 0.2488 | 0.2736 | 0.1379 | 0.2458 | |||
2020-05-16 | CLOVA OCR | 0.3296 | 0.1246 | 0.4612 | 0.2455 | 0.3622 | 0.3746 | 0.1692 | 0.2736 | 0.0690 | 0.3205 | |||
2020-04-29 | docVQAQV_V0.1 | 0.3016 | 0.2010 | 0.3898 | 0.3810 | 0.2933 | 0.0664 | 0.1842 | 0.2736 | 0.1586 | 0.1695 | |||
2020-04-26 | docVQAQV_V0 | 0.2342 | 0.1646 | 0.3133 | 0.2623 | 0.2483 | 0.0549 | 0.2277 | 0.1856 | 0.1034 | 0.1635 | |||
2021-02-08 | seq2seq | 0.1081 | 0.0758 | 0.1283 | 0.0829 | 0.1332 | 0.0822 | 0.0786 | 0.0779 | 0.4828 | 0.1052 | |||
2020-06-16 | Test Submission | 0.0000 | 0.0000 | 0.0000 | 0.0000 | 0.0000 | 0.0000 | 0.0000 | 0.0000 | 0.0000 | 0.0000 |