VERA Voice Reasoning Evaluation Dataset
Date
Size
Publish URL
Paper URL
License
CC BY 4.0
VERA is a large-scale, multi-task speech dataset released in 2025 by Duke University in collaboration with Adobe, designed to evaluate native speech reasoning capabilities. The related research paper is titled "Voice Evaluation of Reasoning Ability: Diagnosing the Modality-Induced Performance GapThe goal is to evaluate the reasoning ability of large models under voice-native conditions.
This dataset contains 2,931 native speech inference samples (episodes), which are divided into five tracks based on task characteristics:
- Math (115 entries): Competition math problems from AIME 2025
- Web (1,107 entries): Web browsing and information retrieval tasks from BrowseComp
- Science (161 items): Graduate-level science questions based on GPQA Diamond.
- Long-Context (548 items): Multi-round long-text reading comprehension tasks from MRCR
- Factual (1,000 entries): Factual questions and answers based on SimpleQA.
All samples are presented in native speech form, with audio synthesized by Boson Higgs Audio 2 to ensure consistent, clear, and high-quality speech performance. The audio_file field of each sample in the dataset points to the corresponding audio path.
Data Structures:
The data is organized in JSON format, and each episode contains a complete speech inference sample. Its core fields include:
- id: unique identifier
- track: The track to which it belongs (mathematical_reasoning / web / science / long_context / factual)
- turns: a number of dialogue rounds, including:
- role (fixed to user)
- text_content (Base64 encrypted text)
- audio_file (corresponding audio path)
- prefix_text and postfix_text (can be empty)
- context_documents: Supplementary contextual material (if any)
- interruptions: interrupt event logging
- metadata.expected_answer: The encrypted reference answer
- canary: The unique decryption key for this sample.
Build AI with AI
From idea to launch — accelerate your AI development with free AI co-coding, out-of-the-box environment and best price of GPUs.