Nemotron-Pretraining-Dataset-sample Sampling Dataset
* This dataset supports online use.Click here to jump.
Nemotron-Pretraining-Dataset-sample is a streamlined sampling version of the Nemotron pretraining dataset released by NVIDIA in 2025. The related paper results are "NVIDIA Nemotron Nano 2: An Accurate and Efficient Hybrid Mamba-Transformer Reasoning Model".
The dataset contains 10 representative subsets selected from different components of the complete SFT and pre-training corpus, covering high-quality question-answering data, extracted content focused on the mathematical field, code metadata, and SFT-style instruction data, suitable for review and quick experiments.
Build AI with AI
From idea to launch — accelerate your AI development with free AI co-coding, out-of-the-box environment and best price of GPUs.