method: SMoLA-PaLI-X Generalist Model2023-12-06
Authors: SMoLA PaLI Team
Affiliation: Google Research
Description: Omni-SMoLA uses the Soft MoE approach to (softly) mix many multimodal low rank experts.
method: PaLI-X2023-05-31
Authors: Xi Chen et al
Affiliation: Google Research
Description: Scaled up PaLI-X model
method: PaliGemma-3B (finetune, 896px)2024-05-21
Description Paper Source Code
Date | Method | Score | |||
---|---|---|---|---|---|
2023-12-06 | SMoLA-PaLI-X Generalist Model | 0.8603 | |||
2023-05-31 | PaLI-X | 0.8446 | |||
2024-05-21 | PaliGemma-3B (finetune, 896px) | 0.8440 | |||
2024-05-21 | PaliGemma-3B (finetune, 448px) | 0.8182 | |||
2022-09-21 | PaLI-17B | 0.7987 | |||
2022-09-21 | PaLI-15B | 0.7652 | |||
2022-07-26 | GIT2, Single Model | 0.7577 | |||
2022-09-20 | PaLI-3B | 0.6972 | |||
2022-05-27 | GIT, Single Model | 0.6964 | |||
2022-11-07 | unitnt blip | 0.6633 | |||
2022-09-20 | PreSTU CC15M-SplitOCR B+B | 0.6546 | |||
2024-05-21 | PaliGemma-3B (finetune, 224px) | 0.6329 | |||
2022-08-15 | LTG | 0.6089 | |||
2022-07-28 | TAG | 0.6019 | |||
2020-12-08 | TAP | 0.5967 | |||
2022-08-03 | ROQ | 0.5926 | |||
2022-11-25 | micro_60 | 0.5882 | |||
2022-03-15 | TWA | 0.5774 | |||
2024-01-12 | tap_visualbert | 0.5691 | |||
2020-09-09 | ssbaseline | 0.5500 | |||
2022-07-11 | danet | 0.5312 | |||
2022-10-22 | KgMr | 0.5273 | |||
2021-04-07 | m4c_demo | 0.5256 | |||
2022-12-08 | a | 0.5245 | |||
2021-07-20 | DXM_DI_AI_CV_NLP | 0.5077 | |||
2020-08-15 | TIG | 0.5051 | |||
2020-05-25 | SA-M4C | 0.5042 | |||
2021-02-10 | M4C-CVL | 0.5034 | |||
2020-07-28 | vm | 0.4916 | |||
2020-04-02 | masker | 0.4828 | |||
2020-03-01 | SMA | 0.4659 | |||
2019-11-02 | M4C (single model) | 0.4621 | |||
2019-11-15 | RUArt | 0.3108 | |||
2019-04-30 | VTA | 0.2820 | |||
2019-04-30 | QAQ | 0.2563 | |||
2019-04-22 | Clova AI OCR | 0.2155 | |||
2019-11-10 | MM-GNN | 0.2071 | |||
2019-04-29 | USTB-TQA | 0.1702 | |||
2019-04-29 | USTB-TVQA | 0.0952 | |||
2019-04-29 | Focus: A bottom-up approach for Scene Text VQA | 0.0882 |