OpenThoughts2-1M is an open source reasoning dataset released by Open Thoughts in 2025. The related paper results are:OpenThoughts: Data Recipes for Reasoning Models".
The dataset is based on the OpenThoughts-114k dataset, adding existing datasets such as OpenR1 and other math and code reasoning data. The data contains 1 million high-quality examples covering math, science, code, and puzzles. The performance of the OpenThinker2 model trained on this dataset is comparable to the DeepSeek-R1-Distill model.

Data Structure
Build AI with AI
From idea to launch — accelerate your AI development with free AI co-coding, out-of-the-box environment and best price of GPUs.