HyperAI
HyperAI
Home
Console
Docs
News
Papers
Tutorials
Datasets
Wiki
SOTA
LLM Models
GPU Leaderboard
Events
Search
About
Terms of Service
Privacy Policy
English
HyperAI
HyperAI
Toggle Sidebar
Search the site…
⌘
K
Command Palette
Search for a command to run...
Console
Home
SOTA
Question Answering
Question Answering On Pubmedqa
Question Answering On Pubmedqa
Metrics
Accuracy
Results
Performance results of various models on this benchmark
Columns
Model Name
Accuracy
Paper Title
Meditron-70B (CoT + SC)
81.6
MEDITRON-70B: Scaling Medical Pretraining for Large Language Models
BioGPT-Large(1.5B)
81.0
BioGPT: Generative Pre-trained Transformer for Biomedical Text Generation and Mining
RankRAG-llama3-70B (Zero-Shot)
79.8
RankRAG: Unifying Context Ranking with Retrieval-Augmented Generation in LLMs
Med-PaLM 2 (5-shot)
79.2
Towards Expert-Level Medical Question Answering with Large Language Models
Flan-PaLM (540B, Few-shot)
79
Large Language Models Encode Clinical Knowledge
BioGPT(345M)
78.2
BioGPT: Generative Pre-trained Transformer for Biomedical Text Generation and Mining
Codex 5-shot CoT
78.2
Can large language models reason about medical questions?
Human Performance (single annotator)
78.0
PubMedQA: A Dataset for Biomedical Research Question Answering
GAL 120B (zero-shot)
77.6
Galactica: A Large Language Model for Science
Flan-PaLM (62B, Few-shot)
77.2
Large Language Models Encode Clinical Knowledge
MediSwift-XL
76.8
MediSwift: Efficient Sparse Pre-trained Biomedical Language Models
Flan-T5-XXL
76.80
Evaluation of large language model performance on the Biomedical Language Understanding and Reasoning Benchmark
BioMedGPT-10B
76.1
BioMedGPT: Open Multimodal Generative Pre-trained Transformer for BioMedicine
Claude 3 Opus (5-shot)
75.8
The Claude 3 Model Family: Opus, Sonnet, Haiku
Flan-PaLM (540B, SC)
75.2
Large Language Models Encode Clinical Knowledge
Med-PaLM 2 (ER)
75.0
Towards Expert-Level Medical Question Answering with Large Language Models
Claude 3 Opus (zero-shot)
74.9
The Claude 3 Model Family: Opus, Sonnet, Haiku
Med-PaLM 2 (CoT + SC)
74.0
Towards Expert-Level Medical Question Answering with Large Language Models
BLOOM (zero-shot)
73.6
Galactica: A Large Language Model for Science
CoT-T5-11B (1024 Shot)
73.42
The CoT Collection: Improving Zero-shot and Few-shot Learning of Language Models via Chain-of-Thought Fine-Tuning
0 of 29 row(s) selected.
Previous
Next
HyperAI
HyperAI
Home
Console
Docs
News
Papers
Tutorials
Datasets
Wiki
SOTA
LLM Models
GPU Leaderboard
Events
Search
About
Terms of Service
Privacy Policy
English
HyperAI
HyperAI
Toggle Sidebar
Search the site…
⌘
K
Command Palette
Search for a command to run...
Console
Home
SOTA
Question Answering
Question Answering On Pubmedqa
Question Answering On Pubmedqa
Metrics
Accuracy
Results
Performance results of various models on this benchmark
Columns
Model Name
Accuracy
Paper Title
Meditron-70B (CoT + SC)
81.6
MEDITRON-70B: Scaling Medical Pretraining for Large Language Models
BioGPT-Large(1.5B)
81.0
BioGPT: Generative Pre-trained Transformer for Biomedical Text Generation and Mining
RankRAG-llama3-70B (Zero-Shot)
79.8
RankRAG: Unifying Context Ranking with Retrieval-Augmented Generation in LLMs
Med-PaLM 2 (5-shot)
79.2
Towards Expert-Level Medical Question Answering with Large Language Models
Flan-PaLM (540B, Few-shot)
79
Large Language Models Encode Clinical Knowledge
BioGPT(345M)
78.2
BioGPT: Generative Pre-trained Transformer for Biomedical Text Generation and Mining
Codex 5-shot CoT
78.2
Can large language models reason about medical questions?
Human Performance (single annotator)
78.0
PubMedQA: A Dataset for Biomedical Research Question Answering
GAL 120B (zero-shot)
77.6
Galactica: A Large Language Model for Science
Flan-PaLM (62B, Few-shot)
77.2
Large Language Models Encode Clinical Knowledge
MediSwift-XL
76.8
MediSwift: Efficient Sparse Pre-trained Biomedical Language Models
Flan-T5-XXL
76.80
Evaluation of large language model performance on the Biomedical Language Understanding and Reasoning Benchmark
BioMedGPT-10B
76.1
BioMedGPT: Open Multimodal Generative Pre-trained Transformer for BioMedicine
Claude 3 Opus (5-shot)
75.8
The Claude 3 Model Family: Opus, Sonnet, Haiku
Flan-PaLM (540B, SC)
75.2
Large Language Models Encode Clinical Knowledge
Med-PaLM 2 (ER)
75.0
Towards Expert-Level Medical Question Answering with Large Language Models
Claude 3 Opus (zero-shot)
74.9
The Claude 3 Model Family: Opus, Sonnet, Haiku
Med-PaLM 2 (CoT + SC)
74.0
Towards Expert-Level Medical Question Answering with Large Language Models
BLOOM (zero-shot)
73.6
Galactica: A Large Language Model for Science
CoT-T5-11B (1024 Shot)
73.42
The CoT Collection: Improving Zero-shot and Few-shot Learning of Language Models via Chain-of-Thought Fine-Tuning
0 of 29 row(s) selected.
Previous
Next
Question Answering On Pubmedqa | SOTA | HyperAI