Date

3 months ago

Paper URL

2602.11685

License

MIT

Tags

Finance

Medicine

Artificial Intelligence

The DRACO cross-domain deep research benchmark dataset is a dataset released by the Perplexity team for evaluating complex research tasks. Related papers include... DRACO: A Cross-Domain Benchmark for Deep Research Accuracy, Completeness, and ObjectivityThe aim is to systematically evaluate the comprehensive capabilities of in-depth research systems in terms of accuracy, completeness, and objectivity. This dataset contains 100 complex research tasks, covering 40 countries and regions across five continents, and encompassing 10 major application areas including finance, shopping/product comparison, academia, and technology. Each task corresponds to a multi-step, multi-source information retrieval and analysis problem, and is accompanied by evaluation criteria designed and validated by 26 domain experts. Each criterion contains an average of approximately 40 evaluation metrics, providing fine-grained evaluation of the model output from four dimensions: factual accuracy, breadth and depth of analysis, presentation quality, and citation quality. The task distribution by field is shown in the following figure:

Data Fields:

id: A unique identifier for the task
domain: The domain to which the task belongs
problem: A complete research query that requires an answer
Answer: The evaluation criteria are encoded in JSON format and include the specific standards for each evaluation dimension.

This dataset is contributed by community users and is intended for educational and informational purposes only. If any content involves copyright infringement, please contact us at [email protected] for prompt review and removal.

Build AI with AI

From idea to launch — accelerate your AI development with free AI co-coding, out-of-the-box environment and best price of GPUs.

AI Co-coding

Ready-to-use GPUs

Best Pricing

Get Started View Pricing

HyperAI Newsletters

Subscribe to our latest updates

We will deliver the latest updates of the week to your inbox at nine o'clock every Monday morning

HyperAI

Use this Dataset Discuss on Discord

Date

3 months ago

Paper URL

2602.11685

License

MIT

Data Fields:

id: A unique identifier for the task
domain: The domain to which the task belongs
problem: A complete research query that requires an answer
Answer: The evaluation criteria are encoded in JSON format and include the specific standards for each evaluation dimension.

Related Datasets

Build AI with AI

From idea to launch — accelerate your AI development with free AI co-coding, out-of-the-box environment and best price of GPUs.

AI Co-coding

Ready-to-use GPUs

Best Pricing

Get Started View Pricing

HyperAI Newsletters

Subscribe to our latest updates

We will deliver the latest updates of the week to your inbox at nine o'clock every Monday morning

HyperAI

Use this Dataset Discuss on Discord

Date

3 months ago

Paper URL

2602.11685

License

MIT

Data Fields:

id: A unique identifier for the task
domain: The domain to which the task belongs
problem: A complete research query that requires an answer
Answer: The evaluation criteria are encoded in JSON format and include the specific standards for each evaluation dimension.

Related Datasets

MDPBench Multilingual Document Parsing Benchmark Dataset

24 days ago

Open-RL Inference Problem Dataset

4 months ago

Pan-Cancer scRNA-Seq Cancer Single-Cell Transcriptional Atlas Dataset

3 months ago

CL-bench Context Learning Evaluation Benchmark Dataset

5 months ago

Build AI with AI

From idea to launch — accelerate your AI development with free AI co-coding, out-of-the-box environment and best price of GPUs.

AI Co-coding

Ready-to-use GPUs

Best Pricing

Get Started View Pricing

HyperAI Newsletters

Subscribe to our latest updates

We will deliver the latest updates of the week to your inbox at nine o'clock every Monday morning

Command Palette

DRACO Cross-Disciplinary Deep Research Benchmark Dataset

Data Fields:

Build AI with AI

HyperAI Newsletters

Command Palette

DRACO Cross-Disciplinary Deep Research Benchmark Dataset

Data Fields:

Related Datasets

MDPBench Multilingual Document Parsing Benchmark Dataset

Open-RL Inference Problem Dataset

Pan-Cancer scRNA-Seq Cancer Single-Cell Transcriptional Atlas Dataset

CL-bench Context Learning Evaluation Benchmark Dataset

Build AI with AI

HyperAI Newsletters

Command Palette

DRACO Cross-Disciplinary Deep Research Benchmark Dataset

Data Fields:

Related Datasets

MDPBench Multilingual Document Parsing Benchmark Dataset

Open-RL Inference Problem Dataset

Pan-Cancer scRNA-Seq Cancer Single-Cell Transcriptional Atlas Dataset

CL-bench Context Learning Evaluation Benchmark Dataset

Build AI with AI

HyperAI Newsletters

Related Datasets

MDPBench Multilingual Document Parsing Benchmark Dataset

Open-RL Inference Problem Dataset

Pan-Cancer scRNA-Seq Cancer Single-Cell Transcriptional Atlas Dataset

CL-bench Context Learning Evaluation Benchmark Dataset

Related Datasets

MDPBench Multilingual Document Parsing Benchmark Dataset

Open-RL Inference Problem Dataset

Pan-Cancer scRNA-Seq Cancer Single-Cell Transcriptional Atlas Dataset

CL-bench Context Learning Evaluation Benchmark Dataset