HyperAI
HyperAI
Home
Console
Docs
News
Papers
Tutorials
Datasets
Wiki
SOTA
LLM Models
GPU Leaderboard
Events
Search
About
Terms of Service
Privacy Policy
English
HyperAI
HyperAI
Toggle Sidebar
Search the site…
⌘
K
Command Palette
Search for a command to run...
Console
Home
SOTA
Audio Generation
Audio Generation On Audiocaps
Audio Generation On Audiocaps
Metrics
FAD
FD
Results
Performance results of various models on this benchmark
Columns
Model Name
FAD
FD
Paper Title
Diffsound
7.75
47.68
Diffsound: Discrete Diffusion Model for Text-to-sound Generation
AudioGen
3.13
-
AudioGen: Textually Guided Audio Generation
Make-An-Audio
2.66
18.32
Make-An-Audio: Text-To-Audio Generation with Prompt-Enhanced Diffusion Models
Tango-AF&AC-FT-AC
2.54
17.19
Improving Text-To-Audio Models with Synthetic Captions
ETTA
2.51
13.12
ETTA: Elucidating the Design Space of Text-to-Audio Models
Consistency TTA (Single-step generation)
2.18
20.44
ConsistencyTTA: Accelerating Diffusion-Based Text-to-Audio Generation with Consistency Distillation
ETTA-FT-AC-100k
2.03
10.10
ETTA: Elucidating the Design Space of Text-to-Audio Models
AudioLDM2-large
2.02
26.18
AudioLDM 2: Learning Holistic Audio Generation with Self-supervised Pretraining
AudioLDM-L-Full
1.96
23.31
AudioLDM: Text-to-Audio Generation with Latent Diffusion Models
Make-An-Audio 2
1.80
11.75
Make-An-Audio 2: Temporal-Enhanced Text-to-Audio Generation
CoDi
1.80
22.90
Any-to-Any Generation via Composable Diffusion
Auffusion-Full
1.76
23.08
Auffusion: Leveraging the Power of Diffusion and Large Language Models for Text-to-Audio Generation
Auffusion
1.63
21.99
Auffusion: Leveraging the Power of Diffusion and Large Language Models for Text-to-Audio Generation
TANGO
1.59
24.52
Text-to-Audio Generation using Instruction-Tuned LLM and Latent Diffusion Model
AudioLDM 2-AC-Large
1.42
-
AudioLDM 2: Learning Holistic Audio Generation with Self-supervised Pretraining
Re-AudioLDM-L
1.37
-
Retrieval-Augmented Text-to-Audio Generation
GenAu-Large
1.21
16.51
Taming Data and Transformers for Audio Generation
Audiobox Sound
0.77
8.30
Audiobox: Unified Audio Generation with Natural Language Prompts
Stable Audio
-
-
Fast Timing-Conditioned Latent Audio Diffusion
TangoFlux
-
-
TangoFlux: Super Fast and Faithful Text to Audio Generation with Flow Matching and Clap-Ranked Preference Optimization
0 of 23 row(s) selected.
Previous
Next
HyperAI
HyperAI
Home
Console
Docs
News
Papers
Tutorials
Datasets
Wiki
SOTA
LLM Models
GPU Leaderboard
Events
Search
About
Terms of Service
Privacy Policy
English
HyperAI
HyperAI
Toggle Sidebar
Search the site…
⌘
K
Command Palette
Search for a command to run...
Console
Home
SOTA
Audio Generation
Audio Generation On Audiocaps
Audio Generation On Audiocaps
Metrics
FAD
FD
Results
Performance results of various models on this benchmark
Columns
Model Name
FAD
FD
Paper Title
Diffsound
7.75
47.68
Diffsound: Discrete Diffusion Model for Text-to-sound Generation
AudioGen
3.13
-
AudioGen: Textually Guided Audio Generation
Make-An-Audio
2.66
18.32
Make-An-Audio: Text-To-Audio Generation with Prompt-Enhanced Diffusion Models
Tango-AF&AC-FT-AC
2.54
17.19
Improving Text-To-Audio Models with Synthetic Captions
ETTA
2.51
13.12
ETTA: Elucidating the Design Space of Text-to-Audio Models
Consistency TTA (Single-step generation)
2.18
20.44
ConsistencyTTA: Accelerating Diffusion-Based Text-to-Audio Generation with Consistency Distillation
ETTA-FT-AC-100k
2.03
10.10
ETTA: Elucidating the Design Space of Text-to-Audio Models
AudioLDM2-large
2.02
26.18
AudioLDM 2: Learning Holistic Audio Generation with Self-supervised Pretraining
AudioLDM-L-Full
1.96
23.31
AudioLDM: Text-to-Audio Generation with Latent Diffusion Models
Make-An-Audio 2
1.80
11.75
Make-An-Audio 2: Temporal-Enhanced Text-to-Audio Generation
CoDi
1.80
22.90
Any-to-Any Generation via Composable Diffusion
Auffusion-Full
1.76
23.08
Auffusion: Leveraging the Power of Diffusion and Large Language Models for Text-to-Audio Generation
Auffusion
1.63
21.99
Auffusion: Leveraging the Power of Diffusion and Large Language Models for Text-to-Audio Generation
TANGO
1.59
24.52
Text-to-Audio Generation using Instruction-Tuned LLM and Latent Diffusion Model
AudioLDM 2-AC-Large
1.42
-
AudioLDM 2: Learning Holistic Audio Generation with Self-supervised Pretraining
Re-AudioLDM-L
1.37
-
Retrieval-Augmented Text-to-Audio Generation
GenAu-Large
1.21
16.51
Taming Data and Transformers for Audio Generation
Audiobox Sound
0.77
8.30
Audiobox: Unified Audio Generation with Natural Language Prompts
Stable Audio
-
-
Fast Timing-Conditioned Latent Audio Diffusion
TangoFlux
-
-
TangoFlux: Super Fast and Faithful Text to Audio Generation with Flow Matching and Clap-Ranked Preference Optimization
0 of 23 row(s) selected.
Previous
Next
Audio Generation On Audiocaps | SOTA | HyperAI