HyperAIHyperAI

Command Palette

Search for a command to run...

COYO-700M image-text Pair Dataset

Date

2 years ago

Size

104.46 GB

Organization

Publish URL

github.com

Featured Image

COYO-700M is a large dataset containing 747 million image-text pairs along with many other meta-attributes to improve usability for training various models. This dataset follows a similar strategy as previous vision and language datasets, collecting many informative alternative texts in HTML documents and their associated image pairs.

Data Collection Process

From October 2020 to August 2021, the research team collected approximately 10 billion pairs of alternative text and image sources in HTML documents in CommonCrawl, and eliminated uninformative pairs with minimal cost through a filtering process at the image and text levels. The figure outlines the research team's data collection process.

coyo-700m.torrent
Seeding 1Downloading 0Completed 151Total Downloads 342
  • coyo-700m/
    • README.md
      1.32 KB
    • README.txt
      2.63 KB
      • data/
        • coyo-700m.zip
          104.46 GB

Build AI with AI

From idea to launch — accelerate your AI development with free AI co-coding, out-of-the-box environment and best price of GPUs.

AI Co-coding
Ready-to-use GPUs
Best Pricing
Get Started

Hyper Newsletters

Subscribe to our latest updates
We will deliver the latest updates of the week to your inbox at nine o'clock every Monday morning
Powered by MailChimp
COYO-700M image-text Pair Dataset | Datasets | HyperAI