method: Human Performance2020-06-13

Authors: DocVQA Organizers

Affiliation: CVIT, IIIT Hyderabad, CVC-UAB, Amazon

Description: Human performance on the test set.
A small group of volunteers were asked to enter an answer for the given question and the image.

method: Seed-VL-1.52025-05-13

Authors: Seed-VL

Affiliation: ByteDance

Description: Seed-VL-1.5

@misc{guo2025seed15vltechnicalreport, title={Seed1.5-VL Technical Report}, author={Dong Guo and Faming Wu and Feida Zhu and Fuxing Leng and Guang Shi and Haobin Chen and Haoqi Fan and Jian Wang and Jianyu Jiang and Jiawei Wang and Jingji Chen and Jingjia Huang and Kang Lei and Liping Yuan and Lishu Luo and Pengfei Liu and Qinghao Ye and Rui Qian and Shen Yan and Shixiong Zhao and Shuai Peng and Shuangye Li and Sihang Yuan and Sijin Wu and Tianheng Cheng and Weiwei Liu and Wenqian Wang and Xianhan Zeng and Xiao Liu and Xiaobo Qin and Xiaohan Ding and Xiaojun Xiao and Xiaoying Zhang and Xuanwei Zhang and Xuehan Xiong and Yanghua Peng and Yangrui Chen and Yanwei Li and Yanxu Hu and Yi Lin and Yiyuan Hu and Yiyuan Zhang and Youbin Wu and Yu Li and Yudong Liu and Yue Ling and Yujia Qin and Zanbo Wang and Zhiwu He and Aoxue Zhang and Bairen Yi and Bencheng Liao and Can Huang and Can Zhang and Chaorui Deng and Chaoyi Deng and Cheng Lin and Cheng Yuan and Chenggang Li and Chenhui Gou and Chenwei Lou and Chengzhi Wei and Chundian Liu and Chunyuan Li and Deyao Zhu and Donghong Zhong and Feng Li and Feng Zhang and Gang Wu and Guodong Li and Guohong Xiao and Haibin Lin and Haihua Yang and Haoming Wang and Heng Ji and Hongxiang Hao and Hui Shen and Huixia Li and Jiahao Li and Jialong Wu and Jianhua Zhu and Jianpeng Jiao and Jiashi Feng and Jiaze Chen and Jianhui Duan and Jihao Liu and Jin Zeng and Jingqun Tang and Jingyu Sun and Joya Chen and Jun Long and Junda Feng and Junfeng Zhan and Junjie Fang and Junting Lu and Kai Hua and Kai Liu and Kai Shen and Kaiyuan Zhang and Ke Shen and Ke Wang and Keyu Pan and Kun Zhang and Kunchang Li and Lanxin Li and Lei Li and Lei Shi and Li Han and Liang Xiang and Liangqiang Chen and Lin Chen and Lin Li and Lin Yan and Liying Chi and Longxiang Liu and Mengfei Du and Mingxuan Wang and Ningxin Pan and Peibin Chen and Pengfei Chen and Pengfei Wu and Qingqing Yuan and Qingyao Shuai and Qiuyan Tao and Renjie Zheng and Renrui Zhang and Ru Zhang and Rui Wang and Rui Yang and Rui Zhao and Shaoqiang Xu and Shihao Liang and Shipeng Yan and Shu Zhong and Shuaishuai Cao and Shuangzhi Wu and Shufan Liu and Shuhan Chang and Songhua Cai and Tenglong Ao and Tianhao Yang and Tingting Zhang and Wanjun Zhong and Wei Jia and Wei Weng and Weihao Yu and Wenhao Huang and Wenjia Zhu and Wenli Yang and Wenzhi Wang and Xiang Long and XiangRui Yin and Xiao Li and Xiaolei Zhu and Xiaoying Jia and Xijin Zhang and Xin Liu and Xinchen Zhang and Xinyu Yang and Xiongcai Luo and Xiuli Chen and Xuantong Zhong and Xuefeng Xiao and Xujing Li and Yan Wu and Yawei Wen and Yifan Du and Yihao Zhang and Yining Ye and Yonghui Wu and Yu Liu and Yu Yue and Yufeng Zhou and Yufeng Yuan and Yuhang Xu and Yuhong Yang and Yun Zhang and Yunhao Fang and Yuntao Li and Yurui Ren and Yuwen Xiong and Zehua Hong and Zehua Wang and Zewei Sun and Zeyu Wang and Zhao Cai and Zhaoyue Zha and Zhecheng An and Zhehui Zhao and Zhengzhuo Xu and Zhipeng Chen and Zhiyong Wu and Zhuofan Zheng and Zihao Wang and Zilong Huang and Ziyu Zhu and Zuquan Song}, year={2025}, eprint={2505.07062}, archivePrefix={arXiv}, primaryClass={cs.CV}, url={https://arxiv.org/abs/2505.07062}, }

method: qwen2-vl2024-07-11

Authors: qwen team

Affiliation: alibaba group

Description: qwen2-vl

Ranking Table

Description Paper Source Code
DateMethodScoreFigure/DiagramFormTable/ListLayoutFree_textImage/PhotoHandwrittenYes/NoOthers
2020-06-13Human Performance0.98110.97560.98250.97800.98450.98390.97400.97170.99740.9828
2025-05-13Seed-VL-1.50.96910.94470.98150.97640.96740.95820.91620.95221.00000.9464
2024-07-11qwen2-vl0.96700.92060.98160.97030.96780.96190.91350.94360.96550.9540
2024-06-30InternVL2-Pro (generalist)0.95060.88880.97140.94860.95820.94460.89090.92780.96550.9410
2025-01-16VideoLLaMA3-7B0.94940.88430.96920.94980.95320.94270.88380.92930.93100.9313
2025-04-03test0.94060.88640.96030.94320.94170.91890.86390.91530.89660.9231
2024-09-25 Molmo-72B0.93510.88220.95480.93870.94130.91000.86880.91960.91950.9229
2025-02-26Qwen2.5-3B-lite0.93420.88070.96220.93160.93970.91280.90220.92340.79310.8887
2024-12-13DeepSeek-VL20.93300.88530.95750.93640.93090.92140.86850.89880.89660.9008
2024-01-24qwenvl-max (single generalist model)0.93070.84910.94740.91950.94030.93800.86520.89220.86210.9341
2024-05-10Vary (using multi crop)0.92410.89260.93720.89530.94050.94470.90350.93350.87390.9478
2024-04-27InternVL-1.5-Plus (generalist)0.92340.83540.95560.91230.93970.90320.83130.90640.96550.9098
2024-11-01MLCD-Embodied-7B: Multi-label Cluster Discrimination for Visual Representation Learning0.91580.82860.93150.91310.92890.90880.78040.83000.88970.8796
2023-12-07qwenvl-plus (single generalist model)0.91410.81460.94640.89990.92770.92650.84190.87760.93100.8667
2023-11-15SMoLA-PaLI-X Specialist Model0.90840.77900.94160.89340.92620.91880.79110.85080.89660.8456
2025-01-08PP-DocBee-2B0.90560.79980.95410.89100.92110.88000.89010.89110.75610.8893
2023-12-07SMoLA-PaLI-X Generalist Model0.90550.77570.93810.89240.91870.91790.83640.84830.74460.8609
2024-05-01 Snowflake Arctic-TILT 0.8B (fine-tuned)0.90200.71980.93980.91520.90150.90420.68600.84150.68970.8604
2022-10-08BAIDU-DI0.90160.68230.91860.91390.91380.92340.68410.79490.61810.8344
2024-04-02InternLM-XComposer2-4KHD-7B0.90020.80410.94000.89650.91430.86180.78450.82640.86210.8298
2024-02-10ScreenAI 5B0.89880.72970.94190.89280.91580.88730.77220.81600.89660.8551
2024-05-01Snowflake Arctic-TILT 0.8B (zero-shot)0.88810.68260.93110.90110.88670.89170.65340.82190.68970.8515
2022-03-31Tencent Youtu0.88660.75760.94700.89320.88210.86540.66800.88770.48280.8413
2022-01-13ERNIE-Layout 2.00.88410.64340.91770.89960.88990.90100.62230.78360.61240.8118
2023-12-10DocFormerv2 (Single Model with 750M Parameters)0.87840.66800.93820.90760.86760.85550.58400.81230.82760.8070
2024-10-30BlueLM-V-3B0.87750.76520.92450.86590.90050.83720.80790.82760.79310.7734
2024-09-08neetolab-sota-v10.87590.79380.92090.85770.89460.85580.80110.86640.62070.8261
2021-11-26Mybank-DocReader0.87550.66820.92330.87630.88960.87130.62900.80470.58050.7804
2021-09-06ERNIE-Layout 1.00.87530.65860.89720.88640.89020.89430.63920.73310.54340.8115
2025-05-08PeKi_DocVQA0.87420.76510.93720.85130.89400.83560.77890.85070.82760.8147
2024-08-22Mini-Monkey0.87380.73340.93500.84930.90460.83830.79310.82620.67820.7628
2024-05-31GPT-4 Vision Turbo + Amazon Textract OCR0.87360.73460.91960.87560.86780.87090.81370.86810.89660.8464
2021-02-12Applica.ai TILT0.87050.60820.94590.89800.85920.85810.55080.81390.68970.7788
2023-05-31PaLI-X (Google Research; Single Generative Model)0.86790.69710.89920.84000.89550.89250.75890.72090.89660.8468
2020-12-22LayoutLM 2.0 (single model)0.86720.65740.89530.87690.87910.87070.72870.67290.55170.8103
2024-01-24nnrc_vary0.86310.66890.91740.83540.88760.87610.68910.82690.62070.7696
2023-12-1054_nnrc_zephyr0.85600.61700.89240.86030.85460.90200.60830.81420.74880.8386
2020-08-16Alibaba DAMO NLP0.85060.66500.88090.85520.87330.83970.67580.76910.54920.7526
2020-05-16PingAn-OneConnect-Gammalab-DQA0.84840.60590.90210.84630.87300.83370.58120.76920.51720.7289
2024-05-01PaliGemma-3B (finetune, 896px)0.84770.65430.92520.83260.87330.80990.73820.83140.79310.7571
2024-01-21Spatial LLM v1.20.84430.63000.89170.81800.86440.88770.61060.73900.68970.8097
2023-02-21LayoutLMv2_star_seg_large0.84300.70080.87370.83890.85360.84980.68720.78230.61810.8252
2025-04-30Vlm(qwen)0.84110.72050.93980.84920.84250.73520.78330.89450.82760.8009
2024-06-26MoVA-8B (generalist)0.83410.76390.84940.81310.87520.81870.65030.70480.51720.7901
2023-06-30LATIN-Prompt + Claude (Zero shot)0.83360.66010.85530.85840.81690.87260.60210.67740.71260.8258
2024-10-09llama3-qwenvit0.83180.73770.89280.78060.88220.80090.77320.79340.58620.7531
2024-09-13gemma+ocr0.82820.63260.83950.82980.83460.87780.64170.61760.64100.8113
2023-11-2736_nnrc_llama20.82390.54040.87870.79580.84750.88130.59950.79910.68970.7922
2024-01-11nnrc_udop_224_6ds0.82270.59090.87060.83520.83350.80860.59720.68350.58620.7472
2024-08-02loixc-onestage0.82210.62150.84630.80200.85930.84070.58350.64530.72410.7352
2024-07-26loixc-vqa0.81270.61820.81880.78780.85600.84960.58400.59840.39930.7445
2025-04-30Vis(qwen)0.80930.70300.92660.82860.79250.69960.82010.87810.79310.7816
2023-05-06Docugami-Layout0.80310.51760.88750.79020.82140.80260.50890.77530.42240.7022
2024-03-01Vary0.79160.74150.79490.73780.84750.81010.66710.65520.74710.7888
2025-01-10llama0.79020.69230.81670.77160.82300.77100.65950.73430.54020.7287
2022-01-07LayoutLMV2-large on Textract0.78730.49240.87710.82180.77260.76610.48200.72760.37930.6983
2023-01-29LayoutLMv2_star_seg0.78590.53280.84060.78590.81280.79090.48790.64680.36440.6953
2024-05-21PaliGemma-3B (finetune, 448px)0.78020.62900.85530.72350.83360.74100.67870.76940.82760.7123
2023-05-25YoBerDaV2 Single-page0.77490.47370.88940.75860.79620.73980.47630.71730.75860.6976
2020-05-14Structural LM-v20.76740.49310.83810.76210.79240.75960.47560.62820.55170.6549
2024-10-09llama3-intern6b0.76700.64190.83600.70360.84550.71840.63230.72030.58620.7121
2022-09-18pix2struct-large0.76560.44240.88270.77020.77740.70850.53830.63200.75860.6536
2022-12-28Submission_ErnieLayout_base_finetuned_on_DocVQA_en_train_dev_textract_word_segments_ck-140000.75990.43130.86780.77260.76410.73300.45980.69570.48280.6097
2024-09-18Gemma 2b + OCR0.75170.47970.80670.71470.77710.83110.49220.59780.42820.6948
2024-04-22DOLMA_multifinetuning0.74580.49640.83350.72340.78320.70440.41350.58150.51720.6593
2024-02-13instructblip0.74290.51580.79180.70190.77510.80880.57650.58920.51720.7062
2025-01-22Ivy-VL0.74170.58530.78740.69190.78560.76740.57530.63410.52430.7174
2025-01-22Ivy-VL-010.74170.58530.78740.69190.78560.76740.57530.63410.52430.7174
2020-05-15QA_Base_MRC_20.74150.48540.80150.67380.79430.81360.57400.58310.52870.7161
2024-07-31tixc-vqa0.74130.57320.75810.69670.79650.77380.47050.53960.58620.6927
2020-05-15QA_Base_MRC_10.74070.48900.79840.66750.79360.81310.58540.60990.49430.7384
2020-05-15QA_Base_MRC_40.73480.47350.80400.66470.78380.80430.56180.58100.45980.7332
2020-05-15QA_Base_MRC_30.73220.48520.79580.65620.78420.80440.56790.57300.45110.7171
2024-10-260713ap +gpt4o(no v)0.73090.51160.80180.73790.73050.73030.46960.62400.45980.7309
2024-01-22VisFocus-Base0.72850.38220.86950.72340.75080.67170.36560.67480.68970.5507
2020-05-15QA_Base_MRC_50.72740.48580.78770.65500.77540.80470.54050.56190.45980.7084
2024-05-22Dolma multifinetuning 70.72190.45320.82590.70360.75850.66770.42270.57400.58620.6452
2022-09-18pix2struct-base0.72130.41110.83860.72530.75030.64070.42110.57530.65520.5822
2024-10-261010ap +gpt4o(no v)0.72010.48000.76800.73350.72580.73330.47970.57670.65520.7111
2024-04-02MiniCPM-V-20.71870.60120.80620.63120.78800.67530.68340.67890.75860.6464
2023-01-27LayoutLM-base+GNN0.69840.47470.79730.68480.73220.63230.43980.55990.54310.5388
2021-12-05Electra Large Squad0.69610.44850.77030.63480.73640.76440.45940.54380.51720.6470
2023-05-25YoBerDaV1 Multi-page0.69040.34810.83350.64110.72530.68540.41910.62990.55170.6129
2020-05-16HyperDQA_V40.68930.38740.77920.63090.74780.71870.48670.56300.41380.5685
2020-05-16HyperDQA_V30.67690.38760.77740.61670.73320.69610.42960.53730.41380.5650
2023-07-06GPT3.50.67590.47410.71440.65240.70360.68580.53850.50380.59540.6660
2020-05-16HyperDQA_V20.67340.38180.76660.61100.73320.68670.48340.55600.37930.5902
2020-05-09HyperDQA_V10.67170.40130.76930.61970.71670.69220.35980.55960.41380.5504
2023-08-15LATIN-Tuning-Prompt + Alpaca (Zero-shot)0.66870.37320.75290.65450.66150.74630.54390.49410.34810.6831
2023-07-14donut_base0.65900.39600.84070.66040.69870.46300.29690.69640.03450.5057
2023-12-04ViTLP0.65880.38800.82200.67050.69620.46700.29730.63070.44830.4910
2023-12-21DocVQA: A Dataset for VQA on Document Images0.65660.35690.76450.57750.70000.72050.42200.48020.44830.6108
2022-09-22BROS_BASE (WebViCoB 6.4M)0.65630.37800.77570.66810.65570.61750.34970.57820.42240.5754
2023-09-24Layoutlm_DocVQA+Token_v20.65620.39350.77640.62280.67370.67110.33850.51090.50860.5515
2023-07-21donut_half_input_imageSize0.65360.39300.83660.65480.69500.46090.24860.69400.03450.4941
2021-12-04Bert Large0.64470.35020.75350.54880.69200.72660.41710.52540.55170.6076
2022-05-23Dessurt0.63220.31640.80580.64860.65200.48520.28620.58300.37930.4365
2024-01-09dolma0.61960.40030.76420.58050.66090.52470.39580.55960.56900.4972
2025-04-30Vlm(llama)0.59140.41670.75760.50240.65690.49380.38390.62580.56320.5198
2020-05-09bert fulldata fintuned0.59000.41690.68700.42690.67100.73150.51240.49000.44830.5907
2020-05-01bert finetuned0.58720.29860.70110.48490.63590.69330.46220.47510.44830.4895
2020-04-30HyperDQA_V00.57150.31310.67800.47320.66300.57160.36230.43510.37930.4941
2023-09-26LayoutLM_Docvqa+Token_v00.49800.23190.60350.43200.56840.47790.27680.30810.12930.4178
2022-04-27LayoutLMv2, Tesseract OCR eval (dataset OCR trained)0.49610.25440.55230.41770.54950.59140.28880.13610.20690.4187
2025-04-30Vis(llama)0.49190.28200.61230.45560.51670.44460.22620.52620.52870.4508
2022-03-29LayoutLMv2, Tesseract OCR eval (Tesseract OCR trained)0.48150.22530.54400.42160.52070.57090.24300.13530.31030.3859
2023-07-26donut_large_encoderSize_finetuned_20_epoch0.46730.22360.66910.45810.50260.26650.13560.49830.57340.3430
2020-04-27bert0.45570.22330.52590.26330.51130.77750.48590.35650.03450.5778
2020-05-16UGLIFT v0.1 (Clova OCR)0.44170.17660.56000.31780.53400.45200.22530.35730.44830.3356
2024-05-21PaliGemma-3B (finetune, 224px)0.43740.40250.45160.32360.55740.40550.35000.40770.63790.4066
2025-04-30HocrEN(Technique 2) - qwen7b0.42820.12420.35010.45940.47550.44790.23270.04000.24140.3793
2025-04-30HocrEN(Technique 2) - qwen14b0.37940.08590.31200.41200.44670.34490.10650.04500.17240.3581
2022-10-21Finetuning LayoutLMv3_Base 0.35960.21020.44980.38580.32620.34960.15520.34040.03450.2706
2023-09-19testtest0.35690.30180.34070.27480.46930.31860.26820.27530.62070.3356
2020-05-14Plain BERT QA0.35240.16870.44890.20290.43210.48120.35170.30960.03450.3747
2020-05-16Clova OCR V00.34890.09770.48550.26700.38110.39580.24890.28750.03450.3062
2020-05-01HDNet0.34010.20400.46880.21810.47100.19160.24880.27360.13790.2458
2020-05-16CLOVA OCR0.32960.12460.46120.24550.36220.37460.16920.27360.06900.3205
2023-07-21donut_small_encoderSize_finetuned_20_epoch0.31570.19350.44170.29120.34000.20750.14950.26580.31030.2644
2020-04-29docVQAQV_V0.10.30160.20100.38980.38100.29330.06640.18420.27360.15860.1695
2025-04-30HocrEN(Technique 2) - qwen32b0.29310.05990.22250.29600.36360.27650.06050.02640.13790.2948
2025-05-10m-rope20.26760.24800.28850.24930.28360.26760.22820.25410.37930.2452
2025-04-30HocrEN(Technique 2) - llama0.24880.07470.22090.20860.30770.27750.14420.05160.24140.2046
2020-04-26docVQAQV_V00.23420.16460.31330.26230.24830.05490.22770.18560.10340.1635
2025-04-30HocrEN(Technique 2) - mistral0.18300.03400.20190.14990.23190.17160.06110.02190.00000.1638
2025-05-15gmini250.17140.12980.16930.17540.18080.16040.17360.23530.20690.1458
2025-05-15doubao150.15850.11410.16190.15990.16760.14600.17410.21940.24140.1430
2025-05-15claude370.15840.11530.16000.15940.16910.14800.15780.18680.24140.1458
2025-05-15gpt4o0.15410.10020.15630.15150.16850.14470.18110.19490.17240.1353
2025-05-15wenxin450.14770.10480.14940.14540.16550.12800.16630.19660.10340.1093
2021-02-08seq2seq0.10810.07580.12830.08290.13320.08220.07860.07790.48280.1052
2024-01-23lixiang-vlm-7b-handled0.09900.04780.07980.03480.16480.08630.13090.13950.55170.1191
2024-01-24lixiang-vlm-7b0.06310.03130.06930.02720.08940.06390.01220.11450.55170.0826
2024-01-21lixiang-vlm handled0.05360.02430.02720.00970.10840.04000.06050.03950.10340.0568
2024-01-21lixiang-vlm0.02640.01760.01230.00450.05020.02620.00780.02910.10340.0273
2020-06-16Test Submission0.00000.00000.00000.00000.00000.00000.00000.00000.00000.0000
2024-09-11zs0.00000.00000.00000.00000.00000.00000.00000.00000.00000.0000

Ranking Graphic