6 months ago

Visual Keyword Spotting with Attention

K R Prajwal* [email protected] Liliane Momeni* [email protected] Triantafyllos Afouras [email protected] Andrew Zisserman [email protected]

Abstract

In this paper, we consider the task of spotting spoken keywords in silent video sequences -- also known as visual keyword spotting. To this end, we investigate Transformer-based models that ingest two streams, a visual encoding of the video and a phonetic encoding of the keyword, and output the temporal location of the keyword if present. Our contributions are as follows: (1) We propose a novel architecture, the Transpotter, that uses full cross-modal attention between the visual and phonetic streams; (2) We show through extensive evaluations that our model outperforms the prior state-of-the-art visual keyword spotting and lip reading methods on the challenging LRW, LRS2, LRS3 datasets by a large margin; (3) We demonstrate the ability of our model to spot words under the extreme conditions of isolated mouthings in sign language videos.

Code Repositories

prajwalkr/transpotter

Official

pytorch

Mentioned in GitHub

Benchmarks

Benchmark	Methodology	Metrics
visual-keyword-spotting-on-lrs2	Transpotter	Top-1 Accuracy: 65 Top-5 Accuracy: 87.1 mAP: 69.2 mAP [email protected]: 68.3
visual-keyword-spotting-on-lrs3-ted	Transpotter	Top-1 Accuracy: 52 Top-5 Accuracy: 77.1 mAP: 55.4 mAP [email protected]: 53.6
visual-keyword-spotting-on-lrw	Transpotter	Top-1 Accuracy: 85.8 Top-5 Accuracy: 99.6 mAP: 64.1

Build AI with AI

From idea to launch — accelerate your AI development with free AI co-coding, out-of-the-box environment and best price of GPUs.

AI Co-coding

Ready-to-use GPUs

Best Pricing

Get Started

Hyper Newsletters

Subscribe to our latest updates

We will deliver the latest updates of the week to your inbox at nine o'clock every Monday morning

HyperAI

Console

6 months ago

Visual Keyword Spotting with Attention

View Paper Details

K R Prajwal* [email protected] Liliane Momeni* [email protected] Triantafyllos Afouras [email protected] Andrew Zisserman [email protected]

Abstract

Code Repositories

prajwalkr/transpotter

Official

pytorch

Mentioned in GitHub

Benchmarks

Benchmark	Methodology	Metrics
visual-keyword-spotting-on-lrs2	Transpotter	Top-1 Accuracy: 65 Top-5 Accuracy: 87.1 mAP: 69.2 mAP [email protected]: 68.3
visual-keyword-spotting-on-lrs3-ted	Transpotter	Top-1 Accuracy: 52 Top-5 Accuracy: 77.1 mAP: 55.4 mAP [email protected]: 53.6
visual-keyword-spotting-on-lrw	Transpotter	Top-1 Accuracy: 85.8 Top-5 Accuracy: 99.6 mAP: 64.1

Build AI with AI

From idea to launch — accelerate your AI development with free AI co-coding, out-of-the-box environment and best price of GPUs.

AI Co-coding

Ready-to-use GPUs

Best Pricing

Get Started

Hyper Newsletters

Subscribe to our latest updates

We will deliver the latest updates of the week to your inbox at nine o'clock every Monday morning

Visual Keyword Spotting with Attention | Papers | HyperAI

Command Palette

Visual Keyword Spotting with Attention

K R Prajwal* [email protected] Liliane Momeni* [email protected] Triantafyllos Afouras [email protected] Andrew Zisserman [email protected]

Abstract

Code Repositories

Benchmarks

Build AI with AI

Hyper Newsletters

Command Palette

Visual Keyword Spotting with Attention

K R Prajwal* [email protected] Liliane Momeni* [email protected] Triantafyllos Afouras [email protected] Andrew Zisserman [email protected]

Abstract

Code Repositories

Benchmarks

Build AI with AI

Hyper Newsletters