HyperAI超神经

Visual Instruction Following On Llava Bench

评估指标

avg score

评测结果

各个模型在此基准测试上的表现结果

		Paper Title
CuMo-7B	85.7	CuMo: Scaling Multimodal LLM with Co-Upcycled Mixture-of-Experts
ShareGPT4V-13B	79.9	ShareGPT4V: Improving Large Multi-Modal Models with Better Captions
ShareGPT4V-7B	72.6	ShareGPT4V: Improving Large Multi-Modal Models with Better Captions
LLaVA-v1.5-13B	70.7	Improved Baselines with Visual Instruction Tuning
LLaVA-v1.5-7B	63.4	Improved Baselines with Visual Instruction Tuning
InstructBLIP-7B	60.9	InstructBLIP: Towards General-purpose Vision-Language Models with Instruction Tuning
InstructBLIP-13B	58.2	InstructBLIP: Towards General-purpose Vision-Language Models with Instruction Tuning
BLIP-2	38.1	BLIP-2: Bootstrapping Language-Image Pre-training with Frozen Image Encoders and Large Language Models

0 of 8 row(s) selected.

HyperAI超神经

Visual Instruction Following On Llava Bench

评估指标

avg score

评测结果

各个模型在此基准测试上的表现结果

		Paper Title
CuMo-7B	85.7	CuMo: Scaling Multimodal LLM with Co-Upcycled Mixture-of-Experts
ShareGPT4V-13B	79.9	ShareGPT4V: Improving Large Multi-Modal Models with Better Captions
ShareGPT4V-7B	72.6	ShareGPT4V: Improving Large Multi-Modal Models with Better Captions
LLaVA-v1.5-13B	70.7	Improved Baselines with Visual Instruction Tuning
LLaVA-v1.5-7B	63.4	Improved Baselines with Visual Instruction Tuning
InstructBLIP-7B	60.9	InstructBLIP: Towards General-purpose Vision-Language Models with Instruction Tuning
InstructBLIP-13B	58.2	InstructBLIP: Towards General-purpose Vision-Language Models with Instruction Tuning
BLIP-2	38.1	BLIP-2: Bootstrapping Language-Image Pre-training with Frozen Image Encoders and Large Language Models

0 of 8 row(s) selected.

Visual Instruction Following On Llava Bench | SOTA | HyperAI超神经