HyperAI
HyperAI
Home
Console
Docs
News
Papers
Tutorials
Datasets
Wiki
SOTA
LLM Models
GPU Leaderboard
Events
Search
About
Terms of Service
Privacy Policy
English
HyperAI
HyperAI
Toggle Sidebar
Search the site…
⌘
K
Command Palette
Search for a command to run...
Console
Home
SOTA
Natural Questions
Natural Questions On Theoremqa
Natural Questions On Theoremqa
Metrics
Accuracy
Results
Performance results of various models on this benchmark
Columns
Model Name
Accuracy
Paper Title
GPT-4 (PoT)
52.4
TheoremQA: A Theorem-driven Question Answering dataset
GPT-4 (CoT)
43.8
TheoremQA: A Theorem-driven Question Answering dataset
GPT-3.5-turbo (PoT)
35.6
TheoremQA: A Theorem-driven Question Answering dataset
DART-Math-DSMath-7B-Uniform (0-shot CoT, w/o code)
32.5
DART-Math: Difficulty-Aware Rejection Tuning for Mathematical Problem-Solving
DART-Math-DSMath-7B-Prop2Diff (0-shot CoT, w/o code)
32.2
DART-Math: Difficulty-Aware Rejection Tuning for Mathematical Problem-Solving
PaLM-2-unicorn (CoT)
31.8
TheoremQA: A Theorem-driven Question Answering dataset
GPT-3.5-turbo (CoT)
30.2
TheoremQA: A Theorem-driven Question Answering dataset
DART-Math-Llama3-70B-Prop2Diff (0-shot CoT, w/o code)
28.2
DART-Math: Difficulty-Aware Rejection Tuning for Mathematical Problem-Solving
DART-Math-Llama3-70B-Uniform (0-shot CoT, w/o code)
27.4
DART-Math: Difficulty-Aware Rejection Tuning for Mathematical Problem-Solving
Claude-v1 (PoT)
25.9
TheoremQA: A Theorem-driven Question Answering dataset
Claude-v1 (CoT)
24.9
TheoremQA: A Theorem-driven Question Answering dataset
code-davinci-002
23.9
TheoremQA: A Theorem-driven Question Answering dataset
Claude-instant (CoT)
23.6
TheoremQA: A Theorem-driven Question Answering dataset
text-davinci-003
22.8
TheoremQA: A Theorem-driven Question Answering dataset
PaLM-2-bison (CoT)
21.0
TheoremQA: A Theorem-driven Question Answering dataset
DART-Math-Llama3-8B-Prop2Diff (0-shot CoT, w/o code)
19.4
DART-Math: Difficulty-Aware Rejection Tuning for Mathematical Problem-Solving
DART-Math-Mistral-7B-Prop2Diff (0-shot CoT, w/o code)
17.0
DART-Math: Difficulty-Aware Rejection Tuning for Mathematical Problem-Solving
DART-Math-Mistral-7B-Uniform (0-shot CoT, w/o code)
16.4
DART-Math: Difficulty-Aware Rejection Tuning for Mathematical Problem-Solving
DART-Math-Llama3-8B-Uniform (0-shot CoT, w/o code)
15.4
DART-Math: Difficulty-Aware Rejection Tuning for Mathematical Problem-Solving
0 of 19 row(s) selected.
Previous
Next
HyperAI
HyperAI
Home
Console
Docs
News
Papers
Tutorials
Datasets
Wiki
SOTA
LLM Models
GPU Leaderboard
Events
Search
About
Terms of Service
Privacy Policy
English
HyperAI
HyperAI
Toggle Sidebar
Search the site…
⌘
K
Command Palette
Search for a command to run...
Console
Home
SOTA
Natural Questions
Natural Questions On Theoremqa
Natural Questions On Theoremqa
Metrics
Accuracy
Results
Performance results of various models on this benchmark
Columns
Model Name
Accuracy
Paper Title
GPT-4 (PoT)
52.4
TheoremQA: A Theorem-driven Question Answering dataset
GPT-4 (CoT)
43.8
TheoremQA: A Theorem-driven Question Answering dataset
GPT-3.5-turbo (PoT)
35.6
TheoremQA: A Theorem-driven Question Answering dataset
DART-Math-DSMath-7B-Uniform (0-shot CoT, w/o code)
32.5
DART-Math: Difficulty-Aware Rejection Tuning for Mathematical Problem-Solving
DART-Math-DSMath-7B-Prop2Diff (0-shot CoT, w/o code)
32.2
DART-Math: Difficulty-Aware Rejection Tuning for Mathematical Problem-Solving
PaLM-2-unicorn (CoT)
31.8
TheoremQA: A Theorem-driven Question Answering dataset
GPT-3.5-turbo (CoT)
30.2
TheoremQA: A Theorem-driven Question Answering dataset
DART-Math-Llama3-70B-Prop2Diff (0-shot CoT, w/o code)
28.2
DART-Math: Difficulty-Aware Rejection Tuning for Mathematical Problem-Solving
DART-Math-Llama3-70B-Uniform (0-shot CoT, w/o code)
27.4
DART-Math: Difficulty-Aware Rejection Tuning for Mathematical Problem-Solving
Claude-v1 (PoT)
25.9
TheoremQA: A Theorem-driven Question Answering dataset
Claude-v1 (CoT)
24.9
TheoremQA: A Theorem-driven Question Answering dataset
code-davinci-002
23.9
TheoremQA: A Theorem-driven Question Answering dataset
Claude-instant (CoT)
23.6
TheoremQA: A Theorem-driven Question Answering dataset
text-davinci-003
22.8
TheoremQA: A Theorem-driven Question Answering dataset
PaLM-2-bison (CoT)
21.0
TheoremQA: A Theorem-driven Question Answering dataset
DART-Math-Llama3-8B-Prop2Diff (0-shot CoT, w/o code)
19.4
DART-Math: Difficulty-Aware Rejection Tuning for Mathematical Problem-Solving
DART-Math-Mistral-7B-Prop2Diff (0-shot CoT, w/o code)
17.0
DART-Math: Difficulty-Aware Rejection Tuning for Mathematical Problem-Solving
DART-Math-Mistral-7B-Uniform (0-shot CoT, w/o code)
16.4
DART-Math: Difficulty-Aware Rejection Tuning for Mathematical Problem-Solving
DART-Math-Llama3-8B-Uniform (0-shot CoT, w/o code)
15.4
DART-Math: Difficulty-Aware Rejection Tuning for Mathematical Problem-Solving
0 of 19 row(s) selected.
Previous
Next
Natural Questions On Theoremqa | SOTA | HyperAI