HyperAI
HyperAI
Home
Console
Docs
News
Papers
Tutorials
Datasets
Wiki
SOTA
LLM Models
GPU Leaderboard
Events
Search
About
Terms of Service
Privacy Policy
English
HyperAI
HyperAI
Toggle Sidebar
Search the site…
⌘
K
Command Palette
Search for a command to run...
Console
Home
SOTA
Question Answering
Question Answering On Newsqa
Question Answering On Newsqa
Metrics
EM
F1
Results
Performance results of various models on this benchmark
Columns
Model Name
EM
F1
Paper Title
OpenAI/o3-mini-2025-01-31-high
96.52
92.13
o3-mini vs DeepSeek-R1: Which One is Safer?
OpenAI/o1-2024-12-17-high
81.44
88.7
0/1 Deep Neural Networks via Block Coordinate Descent
xAI/grok-2-1212
70.57
88.24
XAI for Transformers: Better Explanations through Conservative Propagation
deepseek-r1
80.57
86.13
DeepSeek-R1: Incentivizing Reasoning Capability in LLMs via Reinforcement Learning
Riple/Saanvi-v0.1
72.61
85.44
Time-series Transformer Generative Adversarial Networks
Anthropic/claude-3-5-sonnet
74.23
82.3
Claude 3.5 Sonnet Model Card Addendum
OpenAI/GPT-4o
70.21
81.74
GPT-4o as the Gold Standard: A Scalable and General Purpose Approach to Filter Language Model Pretraining Data
Google/Gemini 1.5 Flash
68.75
79.91
Gemini 1.5: Unlocking multimodal understanding across millions of tokens of context
SpanBERT
-
73.6
SpanBERT: Improving Pre-training by Representing and Predicting Spans
LinkBERT (large)
-
72.6
LinkBERT: Pretraining Language Models with Document Links
DyREX
-
68.53
DyREx: Dynamic Query Representation for Extractive Question Answering
DecaProp
53.1
66.3
Densely Connected Attention Propagation for Reading Comprehension
BERT+ASGen
54.7
64.5
-
AMANDA
48.4
63.7
A Question-Focused Multi-Factor Attention Network for Question Answering
MINIMAL(Dyn)
50.1
63.2
Efficient and Robust Question Answering from Minimal Context over Documents
FastQAExt
43.7
56.1
Making Neural QA as Simple as Possible but not Simpler
0 of 16 row(s) selected.
Previous
Next
HyperAI
HyperAI
Home
Console
Docs
News
Papers
Tutorials
Datasets
Wiki
SOTA
LLM Models
GPU Leaderboard
Events
Search
About
Terms of Service
Privacy Policy
English
HyperAI
HyperAI
Toggle Sidebar
Search the site…
⌘
K
Command Palette
Search for a command to run...
Console
Home
SOTA
Question Answering
Question Answering On Newsqa
Question Answering On Newsqa
Metrics
EM
F1
Results
Performance results of various models on this benchmark
Columns
Model Name
EM
F1
Paper Title
OpenAI/o3-mini-2025-01-31-high
96.52
92.13
o3-mini vs DeepSeek-R1: Which One is Safer?
OpenAI/o1-2024-12-17-high
81.44
88.7
0/1 Deep Neural Networks via Block Coordinate Descent
xAI/grok-2-1212
70.57
88.24
XAI for Transformers: Better Explanations through Conservative Propagation
deepseek-r1
80.57
86.13
DeepSeek-R1: Incentivizing Reasoning Capability in LLMs via Reinforcement Learning
Riple/Saanvi-v0.1
72.61
85.44
Time-series Transformer Generative Adversarial Networks
Anthropic/claude-3-5-sonnet
74.23
82.3
Claude 3.5 Sonnet Model Card Addendum
OpenAI/GPT-4o
70.21
81.74
GPT-4o as the Gold Standard: A Scalable and General Purpose Approach to Filter Language Model Pretraining Data
Google/Gemini 1.5 Flash
68.75
79.91
Gemini 1.5: Unlocking multimodal understanding across millions of tokens of context
SpanBERT
-
73.6
SpanBERT: Improving Pre-training by Representing and Predicting Spans
LinkBERT (large)
-
72.6
LinkBERT: Pretraining Language Models with Document Links
DyREX
-
68.53
DyREx: Dynamic Query Representation for Extractive Question Answering
DecaProp
53.1
66.3
Densely Connected Attention Propagation for Reading Comprehension
BERT+ASGen
54.7
64.5
-
AMANDA
48.4
63.7
A Question-Focused Multi-Factor Attention Network for Question Answering
MINIMAL(Dyn)
50.1
63.2
Efficient and Robust Question Answering from Minimal Context over Documents
FastQAExt
43.7
56.1
Making Neural QA as Simple as Possible but not Simpler
0 of 16 row(s) selected.
Previous
Next
Question Answering On Newsqa | SOTA | HyperAI