HyperAI
HyperAI
Home
Console
Docs
News
Papers
Tutorials
Datasets
Wiki
SOTA
LLM Models
GPU Leaderboard
Events
Search
About
Terms of Service
Privacy Policy
English
HyperAI
HyperAI
Toggle Sidebar
Search the site…
⌘
K
Command Palette
Search for a command to run...
Console
Home
SOTA
Image Captioning
Image Captioning On Coco
Image Captioning On Coco
Metrics
BLEU-4
CIDEr
Results
Performance results of various models on this benchmark
Columns
Model Name
BLEU-4
CIDEr
Paper Title
ExpansionNet v2
-
143.7
Exploiting Multiple Sequence Lengths in Fast End to End Training for Image Captioning
M2 Transformer
-
131.2
Meshed-Memory Transformer for Image Captioning
IGINet
39.9
131.0
-
UNIMO-large
39.6
127.7
UNIMO: Towards Unified-Modal Understanding and Generation via Cross-Modal Contrastive Learning
RDN
-
125.2
Reflective Decoding Network for Image Captioning
Lyrics
-
121.1
Lyrics: Boosting Fine-grained Language-Vision Alignment and Comprehension via Semantic-aware Visual Objects
Bit Diffusion (20 steps)
34.7
115
Analog Bits: Generating Discrete Data using Diffusion Models with Self-Conditioning
Flamingo (80B; 4-shot)
-
103
Retrieval-Augmented Multimodal Language Modeling
RA-CM3 (2.7B)
-
89.1
Retrieval-Augmented Multimodal Language Modeling
Flamingo (3B; 4-shot)
-
85
Retrieval-Augmented Multimodal Language Modeling
Parti
-
83.9
Retrieval-Augmented Multimodal Language Modeling
NIC (ResNet-50, CutMix)
24.9
77.6
CutMix: Regularization Strategy to Train Strong Classifiers with Localizable Features
Vanilla CM3
-
71.9
Retrieval-Augmented Multimodal Language Modeling
X-LXMERT
-
55.8
Retrieval-Augmented Multimodal Language Modeling
minDALL-E
-
48
Retrieval-Augmented Multimodal Language Modeling
ruDALL-E-XL
-
38.7
Retrieval-Augmented Multimodal Language Modeling
DALL-E
-
20.2
Retrieval-Augmented Multimodal Language Modeling
0 of 17 row(s) selected.
Previous
Next
HyperAI
HyperAI
Home
Console
Docs
News
Papers
Tutorials
Datasets
Wiki
SOTA
LLM Models
GPU Leaderboard
Events
Search
About
Terms of Service
Privacy Policy
English
HyperAI
HyperAI
Toggle Sidebar
Search the site…
⌘
K
Command Palette
Search for a command to run...
Console
Home
SOTA
Image Captioning
Image Captioning On Coco
Image Captioning On Coco
Metrics
BLEU-4
CIDEr
Results
Performance results of various models on this benchmark
Columns
Model Name
BLEU-4
CIDEr
Paper Title
ExpansionNet v2
-
143.7
Exploiting Multiple Sequence Lengths in Fast End to End Training for Image Captioning
M2 Transformer
-
131.2
Meshed-Memory Transformer for Image Captioning
IGINet
39.9
131.0
-
UNIMO-large
39.6
127.7
UNIMO: Towards Unified-Modal Understanding and Generation via Cross-Modal Contrastive Learning
RDN
-
125.2
Reflective Decoding Network for Image Captioning
Lyrics
-
121.1
Lyrics: Boosting Fine-grained Language-Vision Alignment and Comprehension via Semantic-aware Visual Objects
Bit Diffusion (20 steps)
34.7
115
Analog Bits: Generating Discrete Data using Diffusion Models with Self-Conditioning
Flamingo (80B; 4-shot)
-
103
Retrieval-Augmented Multimodal Language Modeling
RA-CM3 (2.7B)
-
89.1
Retrieval-Augmented Multimodal Language Modeling
Flamingo (3B; 4-shot)
-
85
Retrieval-Augmented Multimodal Language Modeling
Parti
-
83.9
Retrieval-Augmented Multimodal Language Modeling
NIC (ResNet-50, CutMix)
24.9
77.6
CutMix: Regularization Strategy to Train Strong Classifiers with Localizable Features
Vanilla CM3
-
71.9
Retrieval-Augmented Multimodal Language Modeling
X-LXMERT
-
55.8
Retrieval-Augmented Multimodal Language Modeling
minDALL-E
-
48
Retrieval-Augmented Multimodal Language Modeling
ruDALL-E-XL
-
38.7
Retrieval-Augmented Multimodal Language Modeling
DALL-E
-
20.2
Retrieval-Augmented Multimodal Language Modeling
0 of 17 row(s) selected.
Previous
Next
Image Captioning On Coco | SOTA | HyperAI