method: Seed-VL-1.52025-05-13

Authors: Seed-VL

Affiliation: ByteDance

Description: Seed-VL-1.5

@misc{guo2025seed15vltechnicalreport, title={Seed1.5-VL Technical Report}, author={Dong Guo and Faming Wu and Feida Zhu and Fuxing Leng and Guang Shi and Haobin Chen and Haoqi Fan and Jian Wang and Jianyu Jiang and Jiawei Wang and Jingji Chen and Jingjia Huang and Kang Lei and Liping Yuan and Lishu Luo and Pengfei Liu and Qinghao Ye and Rui Qian and Shen Yan and Shixiong Zhao and Shuai Peng and Shuangye Li and Sihang Yuan and Sijin Wu and Tianheng Cheng and Weiwei Liu and Wenqian Wang and Xianhan Zeng and Xiao Liu and Xiaobo Qin and Xiaohan Ding and Xiaojun Xiao and Xiaoying Zhang and Xuanwei Zhang and Xuehan Xiong and Yanghua Peng and Yangrui Chen and Yanwei Li and Yanxu Hu and Yi Lin and Yiyuan Hu and Yiyuan Zhang and Youbin Wu and Yu Li and Yudong Liu and Yue Ling and Yujia Qin and Zanbo Wang and Zhiwu He and Aoxue Zhang and Bairen Yi and Bencheng Liao and Can Huang and Can Zhang and Chaorui Deng and Chaoyi Deng and Cheng Lin and Cheng Yuan and Chenggang Li and Chenhui Gou and Chenwei Lou and Chengzhi Wei and Chundian Liu and Chunyuan Li and Deyao Zhu and Donghong Zhong and Feng Li and Feng Zhang and Gang Wu and Guodong Li and Guohong Xiao and Haibin Lin and Haihua Yang and Haoming Wang and Heng Ji and Hongxiang Hao and Hui Shen and Huixia Li and Jiahao Li and Jialong Wu and Jianhua Zhu and Jianpeng Jiao and Jiashi Feng and Jiaze Chen and Jianhui Duan and Jihao Liu and Jin Zeng and Jingqun Tang and Jingyu Sun and Joya Chen and Jun Long and Junda Feng and Junfeng Zhan and Junjie Fang and Junting Lu and Kai Hua and Kai Liu and Kai Shen and Kaiyuan Zhang and Ke Shen and Ke Wang and Keyu Pan and Kun Zhang and Kunchang Li and Lanxin Li and Lei Li and Lei Shi and Li Han and Liang Xiang and Liangqiang Chen and Lin Chen and Lin Li and Lin Yan and Liying Chi and Longxiang Liu and Mengfei Du and Mingxuan Wang and Ningxin Pan and Peibin Chen and Pengfei Chen and Pengfei Wu and Qingqing Yuan and Qingyao Shuai and Qiuyan Tao and Renjie Zheng and Renrui Zhang and Ru Zhang and Rui Wang and Rui Yang and Rui Zhao and Shaoqiang Xu and Shihao Liang and Shipeng Yan and Shu Zhong and Shuaishuai Cao and Shuangzhi Wu and Shufan Liu and Shuhan Chang and Songhua Cai and Tenglong Ao and Tianhao Yang and Tingting Zhang and Wanjun Zhong and Wei Jia and Wei Weng and Weihao Yu and Wenhao Huang and Wenjia Zhu and Wenli Yang and Wenzhi Wang and Xiang Long and XiangRui Yin and Xiao Li and Xiaolei Zhu and Xiaoying Jia and Xijin Zhang and Xin Liu and Xinchen Zhang and Xinyu Yang and Xiongcai Luo and Xiuli Chen and Xuantong Zhong and Xuefeng Xiao and Xujing Li and Yan Wu and Yawei Wen and Yifan Du and Yihao Zhang and Yining Ye and Yonghui Wu and Yu Liu and Yu Yue and Yufeng Zhou and Yufeng Yuan and Yuhang Xu and Yuhong Yang and Yun Zhang and Yunhao Fang and Yuntao Li and Yurui Ren and Yuwen Xiong and Zehua Hong and Zehua Wang and Zewei Sun and Zeyu Wang and Zhao Cai and Zhaoyue Zha and Zhecheng An and Zhehui Zhao and Zhengzhuo Xu and Zhipeng Chen and Zhiyong Wu and Zhuofan Zheng and Zihao Wang and Zilong Huang and Ziyu Zhu and Zuquan Song}, year={2025}, eprint={2505.07062}, archivePrefix={arXiv}, primaryClass={cs.CV}, url={https://arxiv.org/abs/2505.07062}, }

method: MiMo-VL-7B-RL2025-06-04

Authors: Xiaomi LLM-Core

Affiliation: Xiaomi

Description: We open-source MiMo-VL-7B-SFT and MiMo-VL-7B-RL, two powerful vision-language models
delivering state-of-the-art performance in both general visual understanding and multimodal
reasoning. MiMo-VL-7B-RL outperforms Qwen2.5-VL-7B on 35 out of 40 evaluated tasks, and
scores 59.4 on OlympiadBench, surpassing models with up to 78B parameters. For GUI grounding
applications, it sets a new standard with 56.1 on OSWorld-G, even outperforming specialized
models such as UI-TARS. Our training combines four-stage pre-training (2.4 trillion tokens) with
Mixed On-policy Reinforcement Learning (MORL) integrating diverse reward signals. We identify
the importance of incorporating high-quality reasoning data with long Chain-of-Thought into
pre-training stages, and the benefits of mixed RL despite challenges in simultaneous multi-domain
optimization. We also contribute a comprehensive evaluation suite covering 50+ tasks to promote
reproducibility and advance the field. The model checkpoints and full evaluation suite are
available at https://github.com/XiaomiMiMo/MiMo-VL.

Ranking Table

Description Paper Source Code
Answer typeEvidenceOperation
DateMethodScoreImage spanQuestion spanMultiple spansNon spanTable/ListTextualVisual objectFigureMapComparisonArithmeticCounting
2022-03-02Human Performance0.97180.97450.97770.93350.97160.97800.97890.97700.96990.94330.97120.98370.9544
2025-05-13Seed-VL-1.50.91200.92520.92980.84030.87780.93710.95710.87490.89250.85030.87280.93960.8246
2025-06-04MiMo-VL-7B-RL0.88060.90050.89490.84460.80920.90080.93320.83730.85770.72980.82640.89130.7439
2025-01-27qwen2.5vl0.87270.89830.91080.79560.77370.87350.95020.84620.85070.75930.76950.92170.6228
2024-07-12qwen2-vl0.84690.87390.87080.77780.74240.85960.94300.78270.81700.75920.72950.89770.5793
2024-12-24InternVL2.5-78B-MPO (generalist)0.84280.87650.87530.69770.73570.83130.92470.83980.82290.73380.72800.88120.5865
2024-06-30InternVL2-Pro (generalist)0.83340.86810.89290.73500.69690.83350.92600.77570.80930.71860.73010.85840.5368
2024-09-25Molmo-72B0.81860.85130.88270.68210.70410.81840.91360.80620.79450.69600.70540.81880.5930
2025-04-03test0.80410.84180.85710.68510.66220.80670.89870.79400.77760.64110.70690.80470.5134
2025-01-10VideoLLaMA3-7B0.78930.82690.83580.68450.64470.79360.91650.74460.74990.66610.64110.77850.5179
2024-12-13DeepSeek-VL20.78140.81890.80100.69890.63630.79350.90410.73710.74340.63270.62060.72820.5326
2025-03-27Qwen/Qwen2.5-VL-7B-Instruct0.77650.81070.81790.66570.63510.77370.89750.75780.73910.68190.62980.78350.4883
2024-04-27InternVL-1.5-Plus (generalist)0.75740.79890.81240.64250.59870.75440.87330.73060.72340.62160.60650.73860.4623
2024-01-24qwenvl-max (single generalist model)0.73410.77560.80830.60350.57170.72910.88560.67080.68920.59670.60090.71520.4388
2024-05-31GPT-4 Vision Turbo + Amazon Textract OCR0.71910.75750.77950.65910.55530.71830.82010.66960.69040.69260.58150.67590.4281
2023-07-05RALLM0.71750.74210.78840.08300.80310.68660.70880.73760.72140.80490.71410.80380.7916
2024-11-01MLCD-Embodied-7B: Multi-label Cluster Discrimination for Visual Representation Learning0.69980.73300.79300.59550.55640.69510.82710.66540.66140.54950.55230.63500.4905
2024-04-02InternLM-XComposer2-4KHD-7B0.68550.73360.75700.51510.51240.66430.82400.65980.64710.52410.51200.66360.3610
2023-11-15SMoLA-PaLI-X Specialist Model0.66210.71660.72520.58380.42920.64480.82610.67140.61100.50650.52380.50540.3506
2024-02-10ScreenAI 5B0.65900.71620.72470.57340.41400.65250.83150.59680.60200.44670.48150.53030.3000
2023-12-07SMoLA-PaLI-X Generalist Model0.65560.71070.72280.56420.41970.62000.82370.67100.60950.52460.51590.49880.3372
2024-09-08neetolab-sota-v10.61950.66200.70210.48140.45130.60150.76520.55050.57760.49960.46760.55280.3491
2021-04-11Applica.ai TILT0.61200.67650.64190.43910.38320.59170.79160.45450.56540.44800.48010.49580.2652
2024-07-22Snowflake Arctic-TILT 0.8B0.56950.62740.60740.41230.36530.54780.75300.42040.51090.44100.43500.50420.2238
2023-08-20PaLI-X (Google Research, Single Generative Model)0.54770.59400.69500.41220.35340.51450.68910.63730.50400.40130.42900.40530.3091
2025-03-27OpenGVLab/InternVL2_5-8B0.51820.54550.65380.32360.40970.50390.61160.50820.48980.46230.39230.52620.2852
2024-05-21PaliGemma-3B (finetune, 896px)0.47750.52140.53720.33010.32200.45000.60570.42520.43770.36900.37420.39240.2507
2024-07-26loixc-vqa0.47150.50000.68150.32500.33090.45210.58530.41080.43640.36120.40060.39190.2505
2024-10-09llama3-qwenvit0.43290.50770.51620.23290.16500.42070.55680.47850.40530.30140.33710.13110.2118
2023-10-09nnrc_udop_2240.42990.47160.52790.24100.27850.37400.57550.34750.39440.33470.29970.35830.1866
2024-05-21PaliGemma-3B (finetune, 448px)0.40470.42750.58010.25600.30070.40100.48530.38980.37420.31780.35300.33600.2517
2022-09-18pix2struct-large0.40010.43080.48390.20590.31730.38330.52560.25720.37260.32830.27620.41980.2017
2024-07-31tixc-vqa0.39750.42640.60920.26200.24960.36930.47980.38260.37040.31720.35710.29270.1965
2021-04-09IG-BERT (single model)0.38540.41810.44810.21970.28490.33730.50160.30130.37060.33470.29390.35640.2000
2022-09-18pix2struct-base0.38200.41450.43810.16550.30140.33510.49710.23800.36320.32570.23440.40360.1888
2024-10-09llama3-internvit0.37490.42940.57150.16410.16270.37210.45800.47410.33850.23500.33290.11140.2109
2024-04-23dolma_multifinetuning0.36330.38320.56600.20450.26570.32840.45700.40420.33290.21740.31170.27310.2491
2021-04-11NAVER CLOVA0.32190.39960.23170.10640.10680.26530.44880.18780.30950.32310.20200.14800.0695
2021-04-10Ensemble LM and VLM0.28530.33370.41810.07480.11690.24390.36490.23310.26450.28450.25800.16280.0647
2024-05-21PaliGemma-3B (finetune, 224px)0.28460.28880.50240.15670.24250.26750.32060.31640.26090.24060.29790.20250.2730
2021-11-09LayoutLMv2 LARGE0.28290.34300.27630.06410.11140.24490.38550.14400.26010.31100.18970.11300.1158
2022-09-20BROS_BASE (WebViCoB 1M)0.28090.34360.24850.02770.13030.25450.36200.13180.27670.28860.22070.17450.0854
2022-03-03InfographicVQA paper model0.27200.32780.23860.04500.13710.24000.36260.17050.25510.22050.18360.15590.1140
2021-04-05BERT fuzzy search0.20780.26250.23330.07390.02590.18520.29950.08960.19420.17090.18050.01600.0436
2025-05-10m-rope20.19720.21430.38660.03820.12360.17590.23880.22330.18600.18150.20840.11780.1234
2021-04-10BERT0.16780.21490.21170.01260.01520.14790.24500.10540.15050.17680.15780.01580.0185
2024-07-1307100.14070.14490.21810.06740.12520.12940.16120.13340.13680.10410.12610.13970.1072

Ranking Graphic