CC12M image-text Pairs Dataset

CC12M (Conceptual 12M) is an image-text pair dataset specifically designed for vision and language pre-training. The dataset contains 12 million image-text pairs. Compared with CC3M, this dataset performs better in long-tail visual recognition for multiple downstream tasks.
Build AI with AI
From idea to launch — accelerate your AI development with free AI co-coding, out-of-the-box environment and best price of GPUs.