HyperAI
HyperAI超神经
首页
算力平台
文档
资讯
论文
教程
数据集
百科
SOTA
LLM 模型天梯
GPU 天梯
顶会
开源项目
全站搜索
关于
服务条款
隐私政策
中文
HyperAI
HyperAI超神经
Toggle Sidebar
全站搜索…
⌘
K
Command Palette
Search for a command to run...
算力平台
首页
SOTA
唇语识别
Lipreading On Lrs2
Lipreading On Lrs2
评估指标
Word Error Rate (WER)
评测结果
各个模型在此基准测试上的表现结果
Columns
模型名称
Word Error Rate (WER)
Paper Title
LIBS
65.29
Hearing Lips: Improving Lip Reading by Distilling Speech Recognizers
TM-CTC + extLM
54.7
Deep Audio-Visual Speech Recognition
CTC + KD ASR
53.2
ASR is all you need: cross-modal distillation for lip reading
Conv-seq2seq
51.7
Spatio-Temporal Fusion Based Convolutional Sequence Learning for Lip Reading
Hybrid CTC / Attention
50
Audio-Visual Speech Recognition With A Hybrid CTC/Attention Architecture
LF-MMI TDNN
48.86
Audio-visual Recognition of Overlapped speech for the LRS2 dataset
TM-seq2seq + extLM
48.3
Deep Audio-Visual Speech Recognition
Multi-head Visual-Audio Memory
44.5
Distinguishing Homophenes Using Multi-Head Visual-Audio Memory for Lip Reading
MoCo + wav2vec (w/o extLM)
43.2
Leveraging Unimodal Self-Supervised Learning for Multimodal Audio-Visual Speech Recognition
Hybrid CTC / Attention
39.1
End-to-end Audio-visual Speech Recognition with Conformers
CTC/Attention
32.9
Visual Speech Recognition for Multiple Languages in the Wild
ES³ Base*
31.4
ES3: Evolving Self-Supervised Learning of Robust Audio-Visual Speech Representations
ES³ Base
30.7
ES3: Evolving Self-Supervised Learning of Robust Audio-Visual Speech Representations
ES³ Base* + extLM
29.3
ES3: Evolving Self-Supervised Learning of Robust Audio-Visual Speech Representations
SyncVSR
28.9
SyncVSR: Data-Efficient Visual Speech Recognition with End-to-End Crossmodal Audio Token Synchronization
VTP
28.9
Sub-word Level Lip Reading With Visual Attention
ES³ Base + extLM
28.7
ES3: Evolving Self-Supervised Learning of Robust Audio-Visual Speech Representations
ES³ Large
26.7
ES3: Evolving Self-Supervised Learning of Robust Audio-Visual Speech Representations
CTC/Attention (LRW+LRS2/3+AVSpeech)
25.5
Visual Speech Recognition for Multiple Languages in the Wild
ES³ Large + extLM
24.6
ES3: Evolving Self-Supervised Learning of Robust Audio-Visual Speech Representations
0 of 25 row(s) selected.
Previous
Next
HyperAI
HyperAI超神经
首页
算力平台
文档
资讯
论文
教程
数据集
百科
SOTA
LLM 模型天梯
GPU 天梯
顶会
开源项目
全站搜索
关于
服务条款
隐私政策
中文
HyperAI
HyperAI超神经
Toggle Sidebar
全站搜索…
⌘
K
Command Palette
Search for a command to run...
算力平台
首页
SOTA
唇语识别
Lipreading On Lrs2
Lipreading On Lrs2
评估指标
Word Error Rate (WER)
评测结果
各个模型在此基准测试上的表现结果
Columns
模型名称
Word Error Rate (WER)
Paper Title
LIBS
65.29
Hearing Lips: Improving Lip Reading by Distilling Speech Recognizers
TM-CTC + extLM
54.7
Deep Audio-Visual Speech Recognition
CTC + KD ASR
53.2
ASR is all you need: cross-modal distillation for lip reading
Conv-seq2seq
51.7
Spatio-Temporal Fusion Based Convolutional Sequence Learning for Lip Reading
Hybrid CTC / Attention
50
Audio-Visual Speech Recognition With A Hybrid CTC/Attention Architecture
LF-MMI TDNN
48.86
Audio-visual Recognition of Overlapped speech for the LRS2 dataset
TM-seq2seq + extLM
48.3
Deep Audio-Visual Speech Recognition
Multi-head Visual-Audio Memory
44.5
Distinguishing Homophenes Using Multi-Head Visual-Audio Memory for Lip Reading
MoCo + wav2vec (w/o extLM)
43.2
Leveraging Unimodal Self-Supervised Learning for Multimodal Audio-Visual Speech Recognition
Hybrid CTC / Attention
39.1
End-to-end Audio-visual Speech Recognition with Conformers
CTC/Attention
32.9
Visual Speech Recognition for Multiple Languages in the Wild
ES³ Base*
31.4
ES3: Evolving Self-Supervised Learning of Robust Audio-Visual Speech Representations
ES³ Base
30.7
ES3: Evolving Self-Supervised Learning of Robust Audio-Visual Speech Representations
ES³ Base* + extLM
29.3
ES3: Evolving Self-Supervised Learning of Robust Audio-Visual Speech Representations
SyncVSR
28.9
SyncVSR: Data-Efficient Visual Speech Recognition with End-to-End Crossmodal Audio Token Synchronization
VTP
28.9
Sub-word Level Lip Reading With Visual Attention
ES³ Base + extLM
28.7
ES3: Evolving Self-Supervised Learning of Robust Audio-Visual Speech Representations
ES³ Large
26.7
ES3: Evolving Self-Supervised Learning of Robust Audio-Visual Speech Representations
CTC/Attention (LRW+LRS2/3+AVSpeech)
25.5
Visual Speech Recognition for Multiple Languages in the Wild
ES³ Large + extLM
24.6
ES3: Evolving Self-Supervised Learning of Robust Audio-Visual Speech Representations
0 of 25 row(s) selected.
Previous
Next
Lipreading On Lrs2 | SOTA | HyperAI超神经