R.R.C. Robust Reading Competition
  • Home (current)
  • Challenges
    • MapText2025
    • Comics Understanding2025
    • NoTeS2025
    • Occluded RoadText2024
    • MapText2024
    • HR-Ciphers2024
    • DocVQA2020-23
    • ReST2023
    • SVRD2023
    • DSText2023
    • DUDE 😎2023
    • NewsVideoQA2023
    • RoadText2023
    • DocILE2023
    • HierText2022
    • Out of Vocabulary2022
    • ST-VQA2019
    • MLT2019
    • LSVT2019
    • ArT2019
    • SROIE2019
    • ReCTS2019
    • COCO-Text2017
    • DeTEXT2017
    • DOST2017
    • FSNS2017
    • MLT2017
    • IEHHR2017
    • Incidental Scene Text2015
    • Text in Videos2013-2015
    • Focused Scene Text2013-2015
    • Born-Digital Images (Web and Email)2011-2015
  • Register
    DocVQA 2020-23
  • Overview
  • Tasks
  • Downloads
  • Results
  • My Methods
  • Organizers
  • Home
  • DocVQA
  • Results
  • Task 3 - Infographics VQA
  • Method: DeepSeek-VL2
  • Task 3 - Infographics VQA - Method: DeepSeek-VL2
  • Method info
  • Samples list
  • Per sample details

method: DeepSeek-VL22024-12-13

Authors: DeepSeek-AI

Affiliation: DeepSeek-AI

Description: DeepSeek-VL2

@misc{wu2024deepseekvl2mixtureofexpertsvisionlanguagemodels, title={DeepSeek-VL2: Mixture-of-Experts Vision-Language Models for Advanced Multimodal Understanding}, author={Zhiyu Wu and Xiaokang Chen and Zizheng Pan and Xingchao Liu and Wen Liu and Damai Dai and Huazuo Gao and Yiyang Ma and Chengyue Wu and Bingxuan Wang and Zhenda Xie and Yu Wu and Kai Hu and Jiawei Wang and Yaofeng Sun and Yukun Li and Yishi Piao and Kang Guan and Aixin Liu and Xin Xie and Yuxiang You and Kai Dong and Xingkai Yu and Haowei Zhang and Liang Zhao and Yisong Wang and Chong Ruan}, year={2024}, eprint={2412.10302}, archivePrefix={arXiv}, primaryClass={cs.CV}, url={https://arxiv.org/abs/2412.10302}, }

Source code