R.R.C. Robust Reading Competition
  • Home (current)
  • Challenges
    • MapText2025
    • Comics Understanding2025
    • NoTeS2025
    • Occluded RoadText2024
    • MapText2024
    • HR-Ciphers2024
    • DocVQA2020-23
    • ReST2023
    • SVRD2023
    • DSText2023
    • DUDE 😎2023
    • NewsVideoQA2023
    • RoadText2023
    • DocILE2023
    • HierText2022
    • Out of Vocabulary2022
    • ST-VQA2019
    • MLT2019
    • LSVT2019
    • ArT2019
    • SROIE2019
    • ReCTS2019
    • COCO-Text2017
    • DeTEXT2017
    • DOST2017
    • FSNS2017
    • MLT2017
    • IEHHR2017
    • Incidental Scene Text2015
    • Text in Videos2013-2015
    • Focused Scene Text2013-2015
    • Born-Digital Images (Web and Email)2011-2015
  • Register
    DocVQA 2020-23
  • Overview
  • Tasks
  • Downloads
  • Results
  • My Methods
  • Organizers
  • Home
  • DocVQA
  • Results
  • Task 3 - Infographics VQA
  • Method: pix2struct-base
  • Per sample details
  • Task 3 - Infographics VQA - Method: pix2struct-base
  • Method info
  • Samples list
  • Per sample details
    Sample 1 of 3288
  • next
Sample RankingScore
Applica.ai TILT1.00
Human Performance1.00
RALLM1.00
SMoLA-PaLI-X Specialist Model1.00
SMoLA-PaLI-X Generalist Model1.00
InternVL2-Pro (generalist)1.00
Molmo-72B1.00
InternVL2.5-78B-MPO (generalist)1.00
VideoLLaMA3-7B1.00
qwen2.5vl1.00
OpenGVLab/InternVL2_5-8B1.00
Qwen/Qwen2.5-VL-7B-Instruct1.00
Seed-VL-1.51.00
qwenvl-max (single generalist model)0.57
BERT fuzzy search0.00
IG-BERT (single model)0.00
BERT0.00
Ensemble LM and VLM0.00
NAVER CLOVA0.00
LayoutLMv2 LARGE0.00
InfographicVQA paper model0.00
pix2struct-large0.00
pix2struct-base0.00
BROS_BASE (WebViCoB 1M)0.00
PaLI-X (Google Research, Single Generative Model)0.00
nnrc_udop_2240.00
ScreenAI 5B0.00
InternLM-XComposer2-4KHD-7B0.00
dolma_multifinetuning0.00
InternVL-1.5-Plus (generalist)0.00
PaliGemma-3B (finetune, 448px)0.00
PaliGemma-3B (finetune, 224px)0.00
PaliGemma-3B (finetune, 896px)0.00
GPT-4 Vision Turbo + Amazon Textract OCR0.00
qwen2-vl0.00
07100.00
Snowflake Arctic-TILT 0.8B0.00
loixc-vqa0.00
tixc-vqa0.00
neetolab-sota-v10.00
llama3-internvit0.00
llama3-qwenvit0.00
MLCD-Embodied-7B: Multi-label Cluster Discrimination for Visual Representation Learning0.00
DeepSeek-VL20.00
test0.00
m-rope20.00