Date

4 years ago

Organization

Publish URL

vipl.ict.ac.cn

Paper URL

arxiv.org

License

Non-Commercial

Tags

Image Recognition

Audio Recognition

Multimodal

Natural Language Processing

CAS-VSR-W1k, formerly known as LRW-1000, is the largest publicly available Mandarin lexical-level lip sync dataset. The dataset contains 1,000 word classes and 700,000 samples from more than 2,000 speakers. The dataset contains more than 1,000,000 Chinese character instances. Each category corresponds to a syllable of a Mandarin word consisting of one or several Chinese characters. The dataset is designed to cover natural variations in different speech modes and imaging conditions to incorporate challenges encountered in real applications.

This dataset is contributed by community users and is intended for educational and informational purposes only. If any content involves copyright infringement, please contact us at [email protected] for prompt review and removal.

Related Datasets

Build AI with AI

From idea to launch — accelerate your AI development with free AI co-coding, out-of-the-box environment and best price of GPUs.

AI Co-coding

Ready-to-use GPUs

Best Pricing

Get Started View Pricing

HyperAI Newsletters

Subscribe to our latest updates

We will deliver the latest updates of the week to your inbox at nine o'clock every Monday morning

HyperAI

Discuss on Discord

Date

4 years ago

Organization

Publish URL

vipl.ict.ac.cn

Paper URL

arxiv.org

License

Non-Commercial

Related Datasets

Nemotron Personas France (French Synthetic Personas Dataset)

2 months ago

Student Mental Health and Burnout Dataset

2 months ago

Pan-Cancer scRNA-Seq Cancer Single-Cell Transcriptional Atlas Dataset

2 months ago

Nemotron-Personas-Brazil Brazilian Synthetic Character Dataset

4 months ago

RoVid-X Robot Video Generation Dataset

2 months ago

DeepPlanning Long-Term Planning Capability Assessment Dataset

4 months ago

Sonar Signal Underwater Sonar Signal Dataset

5 months ago

Nemotron-Math-v2 Mathematical Inference Dataset

5 months ago

GroundingME Complex Scene Understanding Evaluation Dataset

5 months ago

MCIF Multimodal Cross-Language Instruction Following Dataset

5 months ago

TxT360-3efforts Multi-Task Inference Dataset

5 months ago

Build AI with AI

From idea to launch — accelerate your AI development with free AI co-coding, out-of-the-box environment and best price of GPUs.

AI Co-coding

Ready-to-use GPUs

Best Pricing

Get Started View Pricing

HyperAI Newsletters

Subscribe to our latest updates

We will deliver the latest updates of the week to your inbox at nine o'clock every Monday morning

Command Palette

CAS-VSR-W1k Lip Reading Recognition Dataset

Build AI with AI

HyperAI Newsletters

Command Palette

CAS-VSR-W1k Lip Reading Recognition Dataset

Related Datasets

Nemotron Personas France (French Synthetic Personas Dataset)

Student Mental Health and Burnout Dataset

Pan-Cancer scRNA-Seq Cancer Single-Cell Transcriptional Atlas Dataset

Nemotron-Personas-Brazil Brazilian Synthetic Character Dataset

RoVid-X Robot Video Generation Dataset

DeepPlanning Long-Term Planning Capability Assessment Dataset

Sonar Signal Underwater Sonar Signal Dataset

Nemotron-Math-v2 Mathematical Inference Dataset

GroundingME Complex Scene Understanding Evaluation Dataset

MCIF Multimodal Cross-Language Instruction Following Dataset

TxT360-3efforts Multi-Task Inference Dataset

Build AI with AI

HyperAI Newsletters

Command Palette

CAS-VSR-W1k Lip Reading Recognition Dataset

Related Datasets

Nemotron Personas France (French Synthetic Personas Dataset)

Student Mental Health and Burnout Dataset

Pan-Cancer scRNA-Seq Cancer Single-Cell Transcriptional Atlas Dataset

Nemotron-Personas-Brazil Brazilian Synthetic Character Dataset

RoVid-X Robot Video Generation Dataset

DeepPlanning Long-Term Planning Capability Assessment Dataset

Sonar Signal Underwater Sonar Signal Dataset

Nemotron-Math-v2 Mathematical Inference Dataset

GroundingME Complex Scene Understanding Evaluation Dataset

MCIF Multimodal Cross-Language Instruction Following Dataset

TxT360-3efforts Multi-Task Inference Dataset

Build AI with AI

HyperAI Newsletters

Related Datasets

Nemotron Personas France (French Synthetic Personas Dataset)

Student Mental Health and Burnout Dataset

Pan-Cancer scRNA-Seq Cancer Single-Cell Transcriptional Atlas Dataset

Nemotron-Personas-Brazil Brazilian Synthetic Character Dataset

RoVid-X Robot Video Generation Dataset

DeepPlanning Long-Term Planning Capability Assessment Dataset

Sonar Signal Underwater Sonar Signal Dataset

Nemotron-Math-v2 Mathematical Inference Dataset

GroundingME Complex Scene Understanding Evaluation Dataset

MCIF Multimodal Cross-Language Instruction Following Dataset

TxT360-3efforts Multi-Task Inference Dataset

Related Datasets

Nemotron Personas France (French Synthetic Personas Dataset)

Student Mental Health and Burnout Dataset

Pan-Cancer scRNA-Seq Cancer Single-Cell Transcriptional Atlas Dataset

Nemotron-Personas-Brazil Brazilian Synthetic Character Dataset

RoVid-X Robot Video Generation Dataset

DeepPlanning Long-Term Planning Capability Assessment Dataset

Sonar Signal Underwater Sonar Signal Dataset

Nemotron-Math-v2 Mathematical Inference Dataset

GroundingME Complex Scene Understanding Evaluation Dataset

MCIF Multimodal Cross-Language Instruction Following Dataset

TxT360-3efforts Multi-Task Inference Dataset