Date

2 months ago

Organization

Paper URL

2510.20888

License

Apache 2.0

Tags

Image-to-Video

Text-to-Video

VAP-Data, released in 2025 by ByteDance in collaboration with the Chinese University of Hong Kong, is currently the largest semantically controlled video generation dataset. The related research paper is titled "Video-As-Prompt: Unified Semantic Control for Video GenerationThe goal is to provide high-quality training and evaluation benchmarks for controlled video generation, controlled motion synthesis, and multimodal video models.

This dataset contains over 90,000 carefully curated paired samples, covering 100 fine-grained semantic conditions across four semantic categories: concept, style, action, and shot. Each semantic category includes multiple sets of mutually aligned video instances. The video content exhibits great diversity in lighting, perspective, scene, and dynamics, enabling the construction of cross-semantic, finely controlled video generation systems and providing a comprehensive evaluation environment for the model's controllability and generalization ability.

Build AI with AI

From idea to launch — accelerate your AI development with free AI co-coding, out-of-the-box environment and best price of GPUs.

AI Co-coding

Ready-to-use GPUs

Best Pricing

Get Started View Pricing

HyperAI Newsletters

Subscribe to our latest updates

We will deliver the latest updates of the week to your inbox at nine o'clock every Monday morning

HyperAI

Discuss on Discord

Date

2 months ago

Organization

Paper URL

2510.20888

License

Apache 2.0

Related Datasets

Build AI with AI

From idea to launch — accelerate your AI development with free AI co-coding, out-of-the-box environment and best price of GPUs.

AI Co-coding

Ready-to-use GPUs

Best Pricing

Get Started View Pricing

HyperAI Newsletters

Subscribe to our latest updates

We will deliver the latest updates of the week to your inbox at nine o'clock every Monday morning

Command Palette

VAP-Data Visual Action Performance Dataset

Build AI with AI

HyperAI Newsletters

Command Palette

VAP-Data Visual Action Performance Dataset

Related Datasets

Lifestyle Data

Envision Multi-Stage Event Visual Generation Dataset

Camera Clone Multi-view Dataset

Med-Banana-50K Medical Image Editing Dataset

PhysToolBench Physics Tool Task Dataset

ShiftySpeech Speech Distribution Evaluation Dataset

DetectiumFire Multimodal Fire Understanding Dataset

MMSVGBench Multimodal Vector Graphics Generation Benchmark Dataset

MeshCoder: Structured 3D Object-Code Dataset

Build AI with AI

HyperAI Newsletters

Command Palette

VAP-Data Visual Action Performance Dataset

Related Datasets

Lifestyle Data

Envision Multi-Stage Event Visual Generation Dataset

Camera Clone Multi-view Dataset

Med-Banana-50K Medical Image Editing Dataset

PhysToolBench Physics Tool Task Dataset

ShiftySpeech Speech Distribution Evaluation Dataset

DetectiumFire Multimodal Fire Understanding Dataset

MMSVGBench Multimodal Vector Graphics Generation Benchmark Dataset

MeshCoder: Structured 3D Object-Code Dataset

Build AI with AI

HyperAI Newsletters

Related Datasets

Lifestyle Data

Envision Multi-Stage Event Visual Generation Dataset

Camera Clone Multi-view Dataset

Med-Banana-50K Medical Image Editing Dataset

PhysToolBench Physics Tool Task Dataset

ShiftySpeech Speech Distribution Evaluation Dataset

DetectiumFire Multimodal Fire Understanding Dataset

MMSVGBench Multimodal Vector Graphics Generation Benchmark Dataset

MeshCoder: Structured 3D Object-Code Dataset

Related Datasets

Lifestyle Data

Envision Multi-Stage Event Visual Generation Dataset

Camera Clone Multi-view Dataset

Med-Banana-50K Medical Image Editing Dataset

PhysToolBench Physics Tool Task Dataset

ShiftySpeech Speech Distribution Evaluation Dataset

DetectiumFire Multimodal Fire Understanding Dataset

MMSVGBench Multimodal Vector Graphics Generation Benchmark Dataset

MeshCoder: Structured 3D Object-Code Dataset