ViTT Dense Video Description Dataset
Date
Publish URL
Paper URL
License
Other

ViTT stands for Video Timeline Tags, which consists of 8,169 videos with manually generated segment-level annotations. Among them, 5,840 videos are annotated once, and the rest are annotated twice or more. A total of 12,461 sets of annotations have been released for this dataset. The videos in this dataset come from the Youtube-8M dataset.
Build AI with AI
From idea to launch — accelerate your AI development with free AI co-coding, out-of-the-box environment and best price of GPUs.