HyperAI
HyperAI
Home
Console
Docs
News
Papers
Tutorials
Datasets
Wiki
SOTA
LLM Models
GPU Leaderboard
Events
Search
About
Terms of Service
Privacy Policy
English
HyperAI
HyperAI
Toggle Sidebar
Search the site…
⌘
K
Command Palette
Search for a command to run...
Console
Home
SOTA
Visual Grounding
Visual Grounding On Refcoco Testa
Visual Grounding On Refcoco Testa
Metrics
IoU
Results
Performance results of various models on this benchmark
Columns
Model Name
IoU
Paper Title
HYDRA
61.1
HYDRA: A Hyper Agent for Dynamic Compositional Visual Reasoning
XFM (base)
-
Toward Building General Foundation Models for Language, Vision, and Vision-Language Understanding Tasks
X2-VLM (large)
-
X$^2$-VLM: All-In-One Pre-trained Model For Vision-Language Tasks
X2-VLM (base)
-
X$^2$-VLM: All-In-One Pre-trained Model For Vision-Language Tasks
X-VLM (base)
-
Multi-Grained Vision Language Pre-Training: Aligning Texts with Visual Concepts
mPLUG-2
-
mPLUG-2: A Modularized Multi-modal Foundation Model Across Text, Image and Video
Florence-2-large-ft
-
Florence-2: Advancing a Unified Representation for a Variety of Vision Tasks
0 of 7 row(s) selected.
Previous
Next
HyperAI
HyperAI
Home
Console
Docs
News
Papers
Tutorials
Datasets
Wiki
SOTA
LLM Models
GPU Leaderboard
Events
Search
About
Terms of Service
Privacy Policy
English
HyperAI
HyperAI
Toggle Sidebar
Search the site…
⌘
K
Command Palette
Search for a command to run...
Console
Home
SOTA
Visual Grounding
Visual Grounding On Refcoco Testa
Visual Grounding On Refcoco Testa
Metrics
IoU
Results
Performance results of various models on this benchmark
Columns
Model Name
IoU
Paper Title
HYDRA
61.1
HYDRA: A Hyper Agent for Dynamic Compositional Visual Reasoning
XFM (base)
-
Toward Building General Foundation Models for Language, Vision, and Vision-Language Understanding Tasks
X2-VLM (large)
-
X$^2$-VLM: All-In-One Pre-trained Model For Vision-Language Tasks
X2-VLM (base)
-
X$^2$-VLM: All-In-One Pre-trained Model For Vision-Language Tasks
X-VLM (base)
-
Multi-Grained Vision Language Pre-Training: Aligning Texts with Visual Concepts
mPLUG-2
-
mPLUG-2: A Modularized Multi-modal Foundation Model Across Text, Image and Video
Florence-2-large-ft
-
Florence-2: Advancing a Unified Representation for a Variety of Vision Tasks
0 of 7 row(s) selected.
Previous
Next
Visual Grounding On Refcoco Testa | SOTA | HyperAI