6 months ago

Audio Flamingo: A Novel Audio Language Model with Few-Shot Learning and Dialogue Abilities

Zhifeng Kong Arushi Goel Rohan Badlani Wei Ping Rafael Valle Bryan Catanzaro

Abstract

Augmenting large language models (LLMs) to understand audio -- includingnon-speech sounds and non-verbal speech -- is critically important for diversereal-world applications of LLMs. In this paper, we propose Audio Flamingo, anovel audio language model with 1) strong audio understanding abilities, 2) theability to quickly adapt to unseen tasks via in-context learning and retrieval,and 3) strong multi-turn dialogue abilities. We introduce a series of trainingtechniques, architecture design, and data strategies to enhance our model withthese abilities. Extensive evaluations across various audio understanding tasksconfirm the efficacy of our method, setting new state-of-the-art benchmarks.Our demo website is https://audioflamingo.github.io/ and the code isopen-sourced at https://github.com/NVIDIA/audio-flamingo.

Code Repositories

NVIDIA/audio-flamingo

Official

pytorch

Mentioned in GitHub

https://audioflamingo.github.io

Benchmarks

Benchmark	Methodology	Metrics
acoustic-scene-classification-on-cochlscene	Audio Flamingo	1:1 Accuracy: 0.830
audio-captioning-on-clotho	Audio Flamingo (Pengi trainset)	BLEU-4: 17.4 CIDEr: 0.489 METEOR: 18.7 ROUGE-L: 39.4 SPICE: 0.134 SPIDEr: 0.312
retrieval-augmented-few-shot-in-context-audio	Audio Flamingo (4-shot)	CIDEr: 0.518
zero-shot-audio-captioning-on-audiocaps	Audio Flamingo	BLEU-4: 14.3 CIDEr: 50.2 METEOR: 20.5 ROUGE-L: 40.8 SPICE: 15.1 SPIDEr: 32.6

Build AI with AI

From idea to launch — accelerate your AI development with free AI co-coding, out-of-the-box environment and best price of GPUs.

AI Co-coding

Ready-to-use GPUs

Best Pricing

Get Started

Hyper Newsletters

Subscribe to our latest updates

We will deliver the latest updates of the week to your inbox at nine o'clock every Monday morning

HyperAI

Console

6 months ago

Audio Flamingo: A Novel Audio Language Model with Few-Shot Learning and Dialogue Abilities

View Paper Details View Code

Zhifeng Kong Arushi Goel Rohan Badlani Wei Ping Rafael Valle Bryan Catanzaro

Abstract

Code Repositories

NVIDIA/audio-flamingo

Official

pytorch

Mentioned in GitHub

https://audioflamingo.github.io

Benchmarks

Benchmark	Methodology	Metrics
acoustic-scene-classification-on-cochlscene	Audio Flamingo	1:1 Accuracy: 0.830
audio-captioning-on-clotho	Audio Flamingo (Pengi trainset)	BLEU-4: 17.4 CIDEr: 0.489 METEOR: 18.7 ROUGE-L: 39.4 SPICE: 0.134 SPIDEr: 0.312
retrieval-augmented-few-shot-in-context-audio	Audio Flamingo (4-shot)	CIDEr: 0.518
zero-shot-audio-captioning-on-audiocaps	Audio Flamingo	BLEU-4: 14.3 CIDEr: 50.2 METEOR: 20.5 ROUGE-L: 40.8 SPICE: 15.1 SPIDEr: 32.6

Build AI with AI

From idea to launch — accelerate your AI development with free AI co-coding, out-of-the-box environment and best price of GPUs.

AI Co-coding

Ready-to-use GPUs

Best Pricing

Get Started

Hyper Newsletters

Subscribe to our latest updates

We will deliver the latest updates of the week to your inbox at nine o'clock every Monday morning

Audio Flamingo: A Novel Audio Language Model with Few-Shot Learning and Dialogue Abilities | Papers | HyperAI

Command Palette

Audio Flamingo: A Novel Audio Language Model with Few-Shot Learning and Dialogue Abilities

Zhifeng Kong Arushi Goel Rohan Badlani Wei Ping Rafael Valle Bryan Catanzaro

Abstract

Code Repositories

Benchmarks

Build AI with AI

Hyper Newsletters

Command Palette

Audio Flamingo: A Novel Audio Language Model with Few-Shot Learning and Dialogue Abilities

Zhifeng Kong Arushi Goel Rohan Badlani Wei Ping Rafael Valle Bryan Catanzaro

Abstract

Code Repositories

Benchmarks

Build AI with AI

Hyper Newsletters