CapsFusion-120M Multimodal Image and Text Dataset

This dataset is a multimodal image and text dataset launched by Tsinghua University and BAAI in 2024. "CapsFusion: Rethinking Image-Text Data at Scale"It has been accepted by CVPR 2024.
This dataset is a high-quality resource for large-scale multimodal pre-training. This version contains corresponding captions from the LAION-2B and LAION-COCO datasets, which facilitates comparative analysis and further in-depth research on the quality of image-text data.
Each data entry has four fields:
- Image URL
- LAION-2B Title (original alternative text from the web)
- LAION-COCO subtitles (synthesized by BLIP)
- CapsFusion Title (Research Team)
Build AI with AI
From idea to launch — accelerate your AI development with free AI co-coding, out-of-the-box environment and best price of GPUs.