HyperAI
HyperAI
Home
Console
Docs
News
Papers
Tutorials
Datasets
Wiki
SOTA
LLM Models
GPU Leaderboard
Events
Search
About
Terms of Service
Privacy Policy
English
HyperAI
HyperAI
Toggle Sidebar
Search the site…
⌘
K
Command Palette
Search for a command to run...
Console
Home
SOTA
Lipreading
Lipreading On Lrs2
Lipreading On Lrs2
Metrics
Word Error Rate (WER)
Results
Performance results of various models on this benchmark
Columns
Model Name
Word Error Rate (WER)
Paper Title
LIBS
65.29
Hearing Lips: Improving Lip Reading by Distilling Speech Recognizers
TM-CTC + extLM
54.7
Deep Audio-Visual Speech Recognition
CTC + KD ASR
53.2
ASR is all you need: cross-modal distillation for lip reading
Conv-seq2seq
51.7
Spatio-Temporal Fusion Based Convolutional Sequence Learning for Lip Reading
Hybrid CTC / Attention
50
Audio-Visual Speech Recognition With A Hybrid CTC/Attention Architecture
LF-MMI TDNN
48.86
Audio-visual Recognition of Overlapped speech for the LRS2 dataset
TM-seq2seq + extLM
48.3
Deep Audio-Visual Speech Recognition
Multi-head Visual-Audio Memory
44.5
Distinguishing Homophenes Using Multi-Head Visual-Audio Memory for Lip Reading
MoCo + wav2vec (w/o extLM)
43.2
Leveraging Unimodal Self-Supervised Learning for Multimodal Audio-Visual Speech Recognition
Hybrid CTC / Attention
39.1
End-to-end Audio-visual Speech Recognition with Conformers
CTC/Attention
32.9
Visual Speech Recognition for Multiple Languages in the Wild
ES³ Base*
31.4
ES3: Evolving Self-Supervised Learning of Robust Audio-Visual Speech Representations
ES³ Base
30.7
ES3: Evolving Self-Supervised Learning of Robust Audio-Visual Speech Representations
ES³ Base* + extLM
29.3
ES3: Evolving Self-Supervised Learning of Robust Audio-Visual Speech Representations
SyncVSR
28.9
SyncVSR: Data-Efficient Visual Speech Recognition with End-to-End Crossmodal Audio Token Synchronization
VTP
28.9
Sub-word Level Lip Reading With Visual Attention
ES³ Base + extLM
28.7
ES3: Evolving Self-Supervised Learning of Robust Audio-Visual Speech Representations
ES³ Large
26.7
ES3: Evolving Self-Supervised Learning of Robust Audio-Visual Speech Representations
CTC/Attention (LRW+LRS2/3+AVSpeech)
25.5
Visual Speech Recognition for Multiple Languages in the Wild
ES³ Large + extLM
24.6
ES3: Evolving Self-Supervised Learning of Robust Audio-Visual Speech Representations
0 of 25 row(s) selected.
Previous
Next
HyperAI
HyperAI
Home
Console
Docs
News
Papers
Tutorials
Datasets
Wiki
SOTA
LLM Models
GPU Leaderboard
Events
Search
About
Terms of Service
Privacy Policy
English
HyperAI
HyperAI
Toggle Sidebar
Search the site…
⌘
K
Command Palette
Search for a command to run...
Console
Home
SOTA
Lipreading
Lipreading On Lrs2
Lipreading On Lrs2
Metrics
Word Error Rate (WER)
Results
Performance results of various models on this benchmark
Columns
Model Name
Word Error Rate (WER)
Paper Title
LIBS
65.29
Hearing Lips: Improving Lip Reading by Distilling Speech Recognizers
TM-CTC + extLM
54.7
Deep Audio-Visual Speech Recognition
CTC + KD ASR
53.2
ASR is all you need: cross-modal distillation for lip reading
Conv-seq2seq
51.7
Spatio-Temporal Fusion Based Convolutional Sequence Learning for Lip Reading
Hybrid CTC / Attention
50
Audio-Visual Speech Recognition With A Hybrid CTC/Attention Architecture
LF-MMI TDNN
48.86
Audio-visual Recognition of Overlapped speech for the LRS2 dataset
TM-seq2seq + extLM
48.3
Deep Audio-Visual Speech Recognition
Multi-head Visual-Audio Memory
44.5
Distinguishing Homophenes Using Multi-Head Visual-Audio Memory for Lip Reading
MoCo + wav2vec (w/o extLM)
43.2
Leveraging Unimodal Self-Supervised Learning for Multimodal Audio-Visual Speech Recognition
Hybrid CTC / Attention
39.1
End-to-end Audio-visual Speech Recognition with Conformers
CTC/Attention
32.9
Visual Speech Recognition for Multiple Languages in the Wild
ES³ Base*
31.4
ES3: Evolving Self-Supervised Learning of Robust Audio-Visual Speech Representations
ES³ Base
30.7
ES3: Evolving Self-Supervised Learning of Robust Audio-Visual Speech Representations
ES³ Base* + extLM
29.3
ES3: Evolving Self-Supervised Learning of Robust Audio-Visual Speech Representations
SyncVSR
28.9
SyncVSR: Data-Efficient Visual Speech Recognition with End-to-End Crossmodal Audio Token Synchronization
VTP
28.9
Sub-word Level Lip Reading With Visual Attention
ES³ Base + extLM
28.7
ES3: Evolving Self-Supervised Learning of Robust Audio-Visual Speech Representations
ES³ Large
26.7
ES3: Evolving Self-Supervised Learning of Robust Audio-Visual Speech Representations
CTC/Attention (LRW+LRS2/3+AVSpeech)
25.5
Visual Speech Recognition for Multiple Languages in the Wild
ES³ Large + extLM
24.6
ES3: Evolving Self-Supervised Learning of Robust Audio-Visual Speech Representations
0 of 25 row(s) selected.
Previous
Next
Lipreading On Lrs2 | SOTA | HyperAI