HyperAI
HyperAI超神经
首页
算力平台
文档
资讯
论文
教程
数据集
百科
SOTA
LLM 模型天梯
GPU 天梯
顶会
开源项目
全站搜索
关于
服务条款
隐私政策
中文
HyperAI
HyperAI超神经
Toggle Sidebar
全站搜索…
⌘
K
Command Palette
Search for a command to run...
算力平台
首页
SOTA
视觉问答
Visual Question Answering On Mm Vet
Visual Question Answering On Mm Vet
评估指标
GPT-4 score
评测结果
各个模型在此基准测试上的表现结果
Columns
模型名称
GPT-4 score
Paper Title
gemini-2.0-flash-exp
81.2±0.4
-
gemini-exp-1206
78.1±0.2
-
Gemini 1.5 Pro (gemini-1.5-pro-002)
76.9±0.1
Gemini 1.5: Unlocking multimodal understanding across millions of tokens of context
MMCTAgent (GPT-4 + GPT-4V)
74.24
MMCTAgent: Multi-modal Critical Thinking Agent Framework for Complex Visual Reasoning
Claude 3.5 Sonnet (claude-3-5-sonnet-20240620)
74.2±0.2
Claude 3.5 Sonnet Model Card Addendum
Qwen2-VL-72B
74.0
Qwen2-VL: Enhancing Vision-Language Model's Perception of the World at Any Resolution
InternVL2.5-78B
72.3
Expanding Performance Boundaries of Open-Source Multimodal Models with Model, Data, and Test-Time Scaling
GPT-4o +text rationale +IoT
72.2
Image-of-Thought Prompting for Visual Reasoning Refinement in Multimodal Large Language Models
Lyra-Pro
71.4
Lyra: An Efficient and Speech-Centric Framework for Omni-Cognition
GLM-4V-Plus
71.1
CogVLM2: Visual Language Models for Image and Video Understanding
Phantom-7B
70.8
Phantom of Latent for Large Language and Vision Models
GPT-4o (gpt-4o-2024-05-13)
69.3±0.1
GPT-4 Technical Report
InternVL2.5-38B
68.8
Expanding Performance Boundaries of Open-Source Multimodal Models with Model, Data, and Test-Time Scaling
gpt-4o-mini-2024-07-18
68.6±0.1
GPT-4 Technical Report
GPT-4V
67.7±0.3
GPT-4 Technical Report
GPT-4V-Turbo-detail:high
67.6±0.1
GPT-4 Technical Report
Qwen-VL-Max
66.6±0.5
Qwen-VL: A Versatile Vision-Language Model for Understanding, Localization, Text Reading, and Beyond
Gemini 1.5 Pro (gemini-1.5-pro)
65.8±0.1
Gemini 1.5: Unlocking multimodal understanding across millions of tokens of context
InternVL2-26B (SGP, token ratio 64%)
65.60
A Stitch in Time Saves Nine: Small VLM is a Precise Guidance for Accelerating Large VLMs
Baichuan-Omni (7B)
65.4
Baichuan-Omni Technical Report
0 of 229 row(s) selected.
Previous
Next
HyperAI
HyperAI超神经
首页
算力平台
文档
资讯
论文
教程
数据集
百科
SOTA
LLM 模型天梯
GPU 天梯
顶会
开源项目
全站搜索
关于
服务条款
隐私政策
中文
HyperAI
HyperAI超神经
Toggle Sidebar
全站搜索…
⌘
K
Command Palette
Search for a command to run...
算力平台
首页
SOTA
视觉问答
Visual Question Answering On Mm Vet
Visual Question Answering On Mm Vet
评估指标
GPT-4 score
评测结果
各个模型在此基准测试上的表现结果
Columns
模型名称
GPT-4 score
Paper Title
gemini-2.0-flash-exp
81.2±0.4
-
gemini-exp-1206
78.1±0.2
-
Gemini 1.5 Pro (gemini-1.5-pro-002)
76.9±0.1
Gemini 1.5: Unlocking multimodal understanding across millions of tokens of context
MMCTAgent (GPT-4 + GPT-4V)
74.24
MMCTAgent: Multi-modal Critical Thinking Agent Framework for Complex Visual Reasoning
Claude 3.5 Sonnet (claude-3-5-sonnet-20240620)
74.2±0.2
Claude 3.5 Sonnet Model Card Addendum
Qwen2-VL-72B
74.0
Qwen2-VL: Enhancing Vision-Language Model's Perception of the World at Any Resolution
InternVL2.5-78B
72.3
Expanding Performance Boundaries of Open-Source Multimodal Models with Model, Data, and Test-Time Scaling
GPT-4o +text rationale +IoT
72.2
Image-of-Thought Prompting for Visual Reasoning Refinement in Multimodal Large Language Models
Lyra-Pro
71.4
Lyra: An Efficient and Speech-Centric Framework for Omni-Cognition
GLM-4V-Plus
71.1
CogVLM2: Visual Language Models for Image and Video Understanding
Phantom-7B
70.8
Phantom of Latent for Large Language and Vision Models
GPT-4o (gpt-4o-2024-05-13)
69.3±0.1
GPT-4 Technical Report
InternVL2.5-38B
68.8
Expanding Performance Boundaries of Open-Source Multimodal Models with Model, Data, and Test-Time Scaling
gpt-4o-mini-2024-07-18
68.6±0.1
GPT-4 Technical Report
GPT-4V
67.7±0.3
GPT-4 Technical Report
GPT-4V-Turbo-detail:high
67.6±0.1
GPT-4 Technical Report
Qwen-VL-Max
66.6±0.5
Qwen-VL: A Versatile Vision-Language Model for Understanding, Localization, Text Reading, and Beyond
Gemini 1.5 Pro (gemini-1.5-pro)
65.8±0.1
Gemini 1.5: Unlocking multimodal understanding across millions of tokens of context
InternVL2-26B (SGP, token ratio 64%)
65.60
A Stitch in Time Saves Nine: Small VLM is a Precise Guidance for Accelerating Large VLMs
Baichuan-Omni (7B)
65.4
Baichuan-Omni Technical Report
0 of 229 row(s) selected.
Previous
Next