method: Human Performance2020-06-13

Authors: DocVQA Organizers

Affiliation: CVIT, IIIT Hyderabad, CVC-UAB, Amazon

Description: Human performance on the test set.
A small group of volunteers were asked to enter an answer for the given question and the image.

Authors: qwen team

Affiliation: alibaba group

Description: QwenVL
1. One single model, no assamble.
2. End-to-end model, no OCR pipeline.
3. Generalist model, no specialist finetuning.
Give it a go with our model at https://tongyi.aliyun.com/qianwen and API at https://help.aliyun.com/zh/dashscope/developer-reference/vl-plus-quick-start/
Follow us at https://github.com/QwenLM/Qwen-VL

Ranking Table

Description Paper Source Code
DateMethodScoreFigure/DiagramFormTable/ListLayoutFree_textImage/PhotoHandwrittenYes/NoOthers
2020-06-13Human Performance0.98110.97560.98250.97800.98450.98390.97400.97170.99740.9828
2024-01-24qwenvl-max (single generalist model)0.93070.84910.94740.91950.94030.93800.86520.89220.86210.9341
2024-04-27InternVL 1.5 Plus (generalist)0.92340.83540.95560.91230.93970.90320.83130.90640.96550.9098
2023-12-07qwenvl-plus (single generalist model)0.91410.81460.94640.89990.92770.92650.84190.87760.93100.8667
2024-04-20InternVL 1.5 (generalist)0.90850.81850.94160.89400.93060.88770.83730.88300.79310.8698
2023-11-15SMoLA-PaLI-X Specialist Model0.90840.77900.94160.89340.92620.91880.79110.85080.89660.8456
2023-12-07SMoLA-PaLI-X Generalist Model0.90550.77570.93810.89240.91870.91790.83640.84830.74460.8609
2024-05-01 Snowflake Arctic-TILT 0.8B (fine-tuned)0.90200.71980.93980.91520.90150.90420.68600.84150.68970.8604
2022-10-08BAIDU-DI0.90160.68230.91860.91390.91380.92340.68410.79490.61810.8344
2024-04-02InternLM-XComposer2-4KHD-7B0.90020.80410.94000.89650.91430.86180.78450.82640.86210.8298
2024-02-10ScreenAI 5B0.89880.72970.94190.89280.91580.88730.77220.81600.89660.8551
2024-05-01Snowflake Arctic-TILT 0.8B (zero-shot)0.88810.68260.93110.90110.88670.89170.65340.82190.68970.8515
2022-03-31Tencent Youtu0.88660.75760.94700.89320.88210.86540.66800.88770.48280.8413
2022-01-13ERNIE-Layout 2.00.88410.64340.91770.89960.88990.90100.62230.78360.61240.8118
2023-12-10DocFormerv2 (Single Model with 750M Parameters)0.87840.66800.93820.90760.86760.85550.58400.81230.82760.8070
2021-11-26Mybank-DocReader0.87550.66820.92330.87630.88960.87130.62900.80470.58050.7804
2021-09-06ERNIE-Layout 1.00.87530.65860.89720.88640.89020.89430.63920.73310.54340.8115
2021-02-12Applica.ai TILT0.87050.60820.94590.89800.85920.85810.55080.81390.68970.7788
2023-05-31PaLI-X (Google Research; Single Generative Model)0.86790.69710.89920.84000.89550.89250.75890.72090.89660.8468
2020-12-22LayoutLM 2.0 (single model)0.86720.65740.89530.87690.87910.87070.72870.67290.55170.8103
2024-01-24nnrc_vary0.86310.66890.91740.83540.88760.87610.68910.82690.62070.7696
2023-12-1054_nnrc_zephyr0.85600.61700.89240.86030.85460.90200.60830.81420.74880.8386
2020-08-16Alibaba DAMO NLP0.85060.66500.88090.85520.87330.83970.67580.76910.54920.7526
2020-05-16PingAn-OneConnect-Gammalab-DQA0.84840.60590.90210.84630.87300.83370.58120.76920.51720.7289
2024-01-21Spatial LLM v1.20.84430.63000.89170.81800.86440.88770.61060.73900.68970.8097
2023-02-21LayoutLMv2_star_seg_large0.84300.70080.87370.83890.85360.84980.68720.78230.61810.8252
2024-01-12Spatial LLM v1.10.84060.61280.88720.81270.86150.89910.64060.74040.68970.8083
2023-06-30LATIN-Prompt + Claude (Zero shot)0.83360.66010.85530.85840.81690.87260.60210.67740.71260.8258
2023-12-01nnrc mplugowl2_9k0.82810.57800.89490.78600.86620.86310.63020.80540.55170.7867
2024-01-10Spatial LLM v10.82440.58420.87080.79490.84570.89860.60950.71670.62070.8082
2023-11-2736_nnrc_llama20.82390.54040.87870.79580.84750.88130.59950.79910.68970.7922
2024-01-11nnrc_udop_224_6ds0.82270.59090.87060.83520.83350.80860.59720.68350.58620.7472
2023-05-06Docugami-Layout0.80310.51760.88750.79020.82140.80260.50890.77530.42240.7022
2024-03-01Vary0.79160.74150.79490.73780.84750.81010.66710.65520.74710.7888
2022-01-07LayoutLMV2-large on Textract0.78730.49240.87710.82180.77260.76610.48200.72760.37930.6983
2023-01-29LayoutLMv2_star_seg0.78590.53280.84060.78590.81280.79090.48790.64680.36440.6953
2023-05-25YoBerDaV2 Single-page0.77490.47370.88940.75860.79620.73980.47630.71730.75860.6976
2020-05-14Structural LM-v20.76740.49310.83810.76210.79240.75960.47560.62820.55170.6549
2022-09-18pix2struct-large0.76560.44240.88270.77020.77740.70850.53830.63200.75860.6536
2022-12-28Submission_ErnieLayout_base_finetuned_on_DocVQA_en_train_dev_textract_word_segments_ck-140000.75990.43130.86780.77260.76410.73300.45980.69570.48280.6097
2024-02-13instructblip0.74290.51580.79180.70190.77510.80880.57650.58920.51720.7062
2020-05-15QA_Base_MRC_20.74150.48540.80150.67380.79430.81360.57400.58310.52870.7161
2020-05-15QA_Base_MRC_10.74070.48900.79840.66750.79360.81310.58540.60990.49430.7384
2020-05-15QA_Base_MRC_40.73480.47350.80400.66470.78380.80430.56180.58100.45980.7332
2020-05-15QA_Base_MRC_30.73220.48520.79580.65620.78420.80440.56790.57300.45110.7171
2024-01-22OCRF-ALT-c300.72850.38220.86950.72340.75080.67170.36560.67480.68970.5507
2020-05-15QA_Base_MRC_50.72740.48580.78770.65500.77540.80470.54050.56190.45980.7084
2022-09-18pix2struct-base0.72130.41110.83860.72530.75030.64070.42110.57530.65520.5822
2024-04-02MiniCPM-V-20.71870.60120.80620.63120.78800.67530.68340.67890.75860.6464
2023-01-27LayoutLM-base+GNN0.69840.47470.79730.68480.73220.63230.43980.55990.54310.5388
2021-12-05Electra Large Squad0.69610.44850.77030.63480.73640.76440.45940.54380.51720.6470
2023-05-25YoBerDaV1 Multi-page0.69040.34810.83350.64110.72530.68540.41910.62990.55170.6129
2020-05-16HyperDQA_V40.68930.38740.77920.63090.74780.71870.48670.56300.41380.5685
2020-05-16HyperDQA_V30.67690.38760.77740.61670.73320.69610.42960.53730.41380.5650
2023-07-06GPT3.50.67590.47410.71440.65240.70360.68580.53850.50380.59540.6660
2020-05-16HyperDQA_V20.67340.38180.76660.61100.73320.68670.48340.55600.37930.5902
2020-05-09HyperDQA_V10.67170.40130.76930.61970.71670.69220.35980.55960.41380.5504
2023-08-15LATIN-Tuning-Prompt + Alpaca (Zero-shot)0.66870.37320.75290.65450.66150.74630.54390.49410.34810.6831
2023-07-14donut_base0.65900.39600.84070.66040.69870.46300.29690.69640.03450.5057
2023-12-04ViTLP0.65880.38800.82200.67050.69620.46700.29730.63070.44830.4910
2023-12-21DocVQA: A Dataset for VQA on Document Images0.65660.35690.76450.57750.70000.72050.42200.48020.44830.6108
2022-09-22BROS_BASE (WebViCoB 6.4M)0.65630.37800.77570.66810.65570.61750.34970.57820.42240.5754
2023-09-24Layoutlm_DocVQA+Token_v20.65620.39350.77640.62280.67370.67110.33850.51090.50860.5515
2023-07-21donut_half_input_imageSize0.65360.39300.83660.65480.69500.46090.24860.69400.03450.4941
2021-12-04Bert Large0.64470.35020.75350.54880.69200.72660.41710.52540.55170.6076
2022-05-23Dessurt0.63220.31640.80580.64860.65200.48520.28620.58300.37930.4365
2024-03-17DOLMA0.62050.33360.76250.60090.65530.53470.32830.46560.51720.4913
2024-01-09dolma0.61960.40030.76420.58050.66090.52470.39580.55960.56900.4972
2020-05-09bert fulldata fintuned0.59000.41690.68700.42690.67100.73150.51240.49000.44830.5907
2020-05-01bert finetuned0.58720.29860.70110.48490.63590.69330.46220.47510.44830.4895
2020-04-30HyperDQA_V00.57150.31310.67800.47320.66300.57160.36230.43510.37930.4941
2023-09-26LayoutLM_Docvqa+Token_v00.49800.23190.60350.43200.56840.47790.27680.30810.12930.4178
2022-04-27LayoutLMv2, Tesseract OCR eval (dataset OCR trained)0.49610.25440.55230.41770.54950.59140.28880.13610.20690.4187
2022-03-29LayoutLMv2, Tesseract OCR eval (Tesseract OCR trained)0.48150.22530.54400.42160.52070.57090.24300.13530.31030.3859
2023-07-26donut_large_encoderSize_finetuned_20_epoch0.46730.22360.66910.45810.50260.26650.13560.49830.57340.3430
2020-04-27bert0.45570.22330.52590.26330.51130.77750.48590.35650.03450.5778
2020-05-16UGLIFT v0.1 (Clova OCR)0.44170.17660.56000.31780.53400.45200.22530.35730.44830.3356
2022-10-21Finetuning LayoutLMv3_Base 0.35960.21020.44980.38580.32620.34960.15520.34040.03450.2706
2023-09-19testtest0.35690.30180.34070.27480.46930.31860.26820.27530.62070.3356
2020-05-14Plain BERT QA0.35240.16870.44890.20290.43210.48120.35170.30960.03450.3747
2020-05-16Clova OCR V00.34890.09770.48550.26700.38110.39580.24890.28750.03450.3062
2020-05-01HDNet0.34010.20400.46880.21810.47100.19160.24880.27360.13790.2458
2020-05-16CLOVA OCR0.32960.12460.46120.24550.36220.37460.16920.27360.06900.3205
2023-07-21donut_small_encoderSize_finetuned_20_epoch0.31570.19350.44170.29120.34000.20750.14950.26580.31030.2644
2020-04-29docVQAQV_V0.10.30160.20100.38980.38100.29330.06640.18420.27360.15860.1695
2020-04-26docVQAQV_V00.23420.16460.31330.26230.24830.05490.22770.18560.10340.1635
2021-02-08seq2seq0.10810.07580.12830.08290.13320.08220.07860.07790.48280.1052
2024-01-23lixiang-vlm-7b-handled0.09900.04780.07980.03480.16480.08630.13090.13950.55170.1191
2024-01-24lixiang-vlm-7b0.06310.03130.06930.02720.08940.06390.01220.11450.55170.0826
2024-01-21lixiang-vlm handled0.05360.02430.02720.00970.10840.04000.06050.03950.10340.0568
2024-01-21lixiang-vlm0.02640.01760.01230.00450.05020.02620.00780.02910.10340.0273
2020-06-16Test Submission0.00000.00000.00000.00000.00000.00000.00000.00000.00000.0000

Ranking Graphic