OpenScience Multi-domain Synthetic Datasets
Date
Size
Paper URL
License
CC BY 4.0
OpenScience is a multi-domain synthetic dataset released by NVIDIA in 2023. The related paper results are:MEASURING MASSIVE MULTITASK LANGUAGE UNDERSTANDING", which aims to improve the accuracy of high-level benchmarks such as GPQA-Diamond and MMLU-Pro through supervised fine-tuning or reinforcement learning.
The dataset contains 6 million multiple-choice question-answer pairs with detailed reasoning traces, covering multiple scientific fields such as STEM, law, economics, and humanities.
Build AI with AI
From idea to launch — accelerate your AI development with free AI co-coding, out-of-the-box environment and best price of GPUs.