HyperAI
HyperAI超神经
首页
算力平台
文档
资讯
论文
教程
数据集
百科
SOTA
LLM 模型天梯
GPU 天梯
顶会
开源项目
全站搜索
关于
服务条款
隐私政策
中文
HyperAI
HyperAI超神经
Toggle Sidebar
全站搜索…
⌘
K
Command Palette
Search for a command to run...
算力平台
首页
SOTA
视觉定位
Visual Grounding On Refcoco Val
Visual Grounding On Refcoco Val
评估指标
Accuracy (%)
评测结果
各个模型在此基准测试上的表现结果
Columns
模型名称
Accuracy (%)
Paper Title
Florence-2-large-ft
93.4
Florence-2: Advancing a Unified Representation for a Variety of Vision Tasks
mPLUG-2
90.33
mPLUG-2: A Modularized Multi-modal Foundation Model Across Text, Image and Video
X2-VLM (large)
87.6
X$^2$-VLM: All-In-One Pre-trained Model For Vision-Language Tasks
XFM (base)
86.1
Toward Building General Foundation Models for Language, Vision, and Vision-Language Understanding Tasks
X2-VLM (base)
85.2
X$^2$-VLM: All-In-One Pre-trained Model For Vision-Language Tasks
X-VLM (base)
84.51
Multi-Grained Vision Language Pre-Training: Aligning Texts with Visual Concepts
0 of 6 row(s) selected.
Previous
Next
HyperAI
HyperAI超神经
首页
算力平台
文档
资讯
论文
教程
数据集
百科
SOTA
LLM 模型天梯
GPU 天梯
顶会
开源项目
全站搜索
关于
服务条款
隐私政策
中文
HyperAI
HyperAI超神经
Toggle Sidebar
全站搜索…
⌘
K
Command Palette
Search for a command to run...
算力平台
首页
SOTA
视觉定位
Visual Grounding On Refcoco Val
Visual Grounding On Refcoco Val
评估指标
Accuracy (%)
评测结果
各个模型在此基准测试上的表现结果
Columns
模型名称
Accuracy (%)
Paper Title
Florence-2-large-ft
93.4
Florence-2: Advancing a Unified Representation for a Variety of Vision Tasks
mPLUG-2
90.33
mPLUG-2: A Modularized Multi-modal Foundation Model Across Text, Image and Video
X2-VLM (large)
87.6
X$^2$-VLM: All-In-One Pre-trained Model For Vision-Language Tasks
XFM (base)
86.1
Toward Building General Foundation Models for Language, Vision, and Vision-Language Understanding Tasks
X2-VLM (base)
85.2
X$^2$-VLM: All-In-One Pre-trained Model For Vision-Language Tasks
X-VLM (base)
84.51
Multi-Grained Vision Language Pre-Training: Aligning Texts with Visual Concepts
0 of 6 row(s) selected.
Previous
Next
Visual Grounding On Refcoco Val | SOTA | HyperAI超神经