OpenWebMath Open Web Mathematics Training Dataset
Date
Size
OpenWebMath is a dataset containing high-quality mathematical text from most of the Internet. It is filtered and extracted from more than 200B HTML files on Common Crawl, resulting in a set of 6.3 million documents containing a total of 14.7B tokens. OpenWebMath is intended for pre-training andFine-tuningLarge language models.
Build AI with AI
From idea to launch — accelerate your AI development with free AI co-coding, out-of-the-box environment and best price of GPUs.