HyperAI
HyperAI
Home
Console
Docs
News
Papers
Tutorials
Datasets
Wiki
SOTA
LLM Models
GPU Leaderboard
Events
Search
About
Terms of Service
Privacy Policy
English
HyperAI
HyperAI
Toggle Sidebar
Search the site…
⌘
K
Command Palette
Search for a command to run...
Console
Home
SOTA
Question Answering
Question Answering On Triviaqa
Question Answering On Triviaqa
Metrics
EM
Results
Performance results of various models on this benchmark
Columns
Model Name
EM
Paper Title
Claude 2 (few-shot, k=5)
87.5
Model Card and Evaluations for Claude Models
GPT-4-0613
87
-
Claude 1.3 (few-shot, k=5)
86.7
Model Card and Evaluations for Claude Models
RankRAG-llama3-70b (Zero-Shot, KILT)
86.5
RankRAG: Unifying Context Ranking with Retrieval-Augmented Generation in LLMs
PaLM 2-L (one-shot)
86.1
PaLM 2 Technical Report
ChatQA-1.5-llama3-70b (Zero-Shot, KILT)
85.6
ChatQA: Surpassing GPT-4 on Conversational QA and RAG
LLaMA 2 70B (one-shot)
85
Llama 2: Open Foundation and Fine-Tuned Chat Models
GPT-4-0613 (Zero-shot)
84.8
GPT-4 Technical Report
RankRAG-llama3-8b (Zero-Shot, KILT)
82.9
RankRAG: Unifying Context Ranking with Retrieval-Augmented Generation in LLMs
PaLM 2-M (one-shot)
81.7
PaLM 2 Technical Report
PaLM-540B (One-Shot)
81.4
PaLM: Scaling Language Modeling with Pathways
PaLM-540B (Few-Shot)
81.4
PaLM: Scaling Language Modeling with Pathways
ChatQA-1.5-llama3-8B (Zero-Shot, KILT)
81.0
ChatQA: Surpassing GPT-4 on Conversational QA and RAG
GaC(Qwen2-72B-Instruct + Llama-3-70B-Instruct)
79.29
Breaking the Ceiling of the LLM Community by Treating Token Generation as a Classification for Ensembling
Claude Instant 1.1 (few-shot, k=5)
78.9
Model Card and Evaluations for Claude Models
code-davinci-002 175B + REPLUG LSR (Few-Shot)
77.3
REPLUG: Retrieval-Augmented Black-Box Language Models
PaLM-540B (Zero-Shot)
76.9
PaLM: Scaling Language Modeling with Pathways
code-davinci-002 175B + REPLUG (Few-Shot)
76.8
REPLUG: Retrieval-Augmented Black-Box Language Models
GLaM 62B/64E (Few-shot)
75.8
GLaM: Efficient Scaling of Language Models with Mixture-of-Experts
GLaM 62B/64E (One-shot)
75.8
GLaM: Efficient Scaling of Language Models with Mixture-of-Experts
0 of 56 row(s) selected.
Previous
Next
HyperAI
HyperAI
Home
Console
Docs
News
Papers
Tutorials
Datasets
Wiki
SOTA
LLM Models
GPU Leaderboard
Events
Search
About
Terms of Service
Privacy Policy
English
HyperAI
HyperAI
Toggle Sidebar
Search the site…
⌘
K
Command Palette
Search for a command to run...
Console
Home
SOTA
Question Answering
Question Answering On Triviaqa
Question Answering On Triviaqa
Metrics
EM
Results
Performance results of various models on this benchmark
Columns
Model Name
EM
Paper Title
Claude 2 (few-shot, k=5)
87.5
Model Card and Evaluations for Claude Models
GPT-4-0613
87
-
Claude 1.3 (few-shot, k=5)
86.7
Model Card and Evaluations for Claude Models
RankRAG-llama3-70b (Zero-Shot, KILT)
86.5
RankRAG: Unifying Context Ranking with Retrieval-Augmented Generation in LLMs
PaLM 2-L (one-shot)
86.1
PaLM 2 Technical Report
ChatQA-1.5-llama3-70b (Zero-Shot, KILT)
85.6
ChatQA: Surpassing GPT-4 on Conversational QA and RAG
LLaMA 2 70B (one-shot)
85
Llama 2: Open Foundation and Fine-Tuned Chat Models
GPT-4-0613 (Zero-shot)
84.8
GPT-4 Technical Report
RankRAG-llama3-8b (Zero-Shot, KILT)
82.9
RankRAG: Unifying Context Ranking with Retrieval-Augmented Generation in LLMs
PaLM 2-M (one-shot)
81.7
PaLM 2 Technical Report
PaLM-540B (One-Shot)
81.4
PaLM: Scaling Language Modeling with Pathways
PaLM-540B (Few-Shot)
81.4
PaLM: Scaling Language Modeling with Pathways
ChatQA-1.5-llama3-8B (Zero-Shot, KILT)
81.0
ChatQA: Surpassing GPT-4 on Conversational QA and RAG
GaC(Qwen2-72B-Instruct + Llama-3-70B-Instruct)
79.29
Breaking the Ceiling of the LLM Community by Treating Token Generation as a Classification for Ensembling
Claude Instant 1.1 (few-shot, k=5)
78.9
Model Card and Evaluations for Claude Models
code-davinci-002 175B + REPLUG LSR (Few-Shot)
77.3
REPLUG: Retrieval-Augmented Black-Box Language Models
PaLM-540B (Zero-Shot)
76.9
PaLM: Scaling Language Modeling with Pathways
code-davinci-002 175B + REPLUG (Few-Shot)
76.8
REPLUG: Retrieval-Augmented Black-Box Language Models
GLaM 62B/64E (Few-shot)
75.8
GLaM: Efficient Scaling of Language Models with Mixture-of-Experts
GLaM 62B/64E (One-shot)
75.8
GLaM: Efficient Scaling of Language Models with Mixture-of-Experts
0 of 56 row(s) selected.
Previous
Next