HyperAI
HyperAI
Home
Console
Docs
News
Papers
Tutorials
Datasets
Wiki
SOTA
LLM Models
GPU Leaderboard
Events
Search
About
Terms of Service
Privacy Policy
English
HyperAI
HyperAI
Toggle Sidebar
Search the site…
⌘
K
Command Palette
Search for a command to run...
Console
Home
SOTA
Code Generation
Code Generation On Humaneval
Code Generation On Humaneval
Metrics
Pass@1
Results
Performance results of various models on this benchmark
Columns
Model Name
Pass@1
Paper Title
Llama-3 8B (HPT)
100
Hierarchical Prompting Taxonomy: A Universal Evaluation Framework for Large Language Models
Claude 3.5 Sonnet (HPT)
100
Hierarchical Prompting Taxonomy: A Universal Evaluation Framework for Large Language Models
LLMDebugger (OpenAI o1)
99.4
Debug like a Human: A Large Language Model Debugger via Verifying Runtime Execution Step-by-step
CodeSim (o3-mini)
98.8
CODESIM: Multi-Agent Code Generation and Problem Solving through Simulation-Driven Planning and Debugging
QualityFlow (Sonnet-3.5)
98.8
QualityFlow: An Agentic Workflow for Program Synthesis Controlled by LLM Quality Checks
Nexus (Claude 3.5 Sonnet)
98.8
Nexus: A Lightweight and Scalable Multi-Agent Framework for Complex Tasks Automation
LLMDebugger (GPT 4o)
98.2
Debug like a Human: A Large Language Model Debugger via Verifying Runtime Execution Step-by-step
LPW (GPT-4o)
98.2
Planning-Driven Programming: A Large Language Model Programming Workflow
CodeSim (GPT-4o and LDB Debugger )
97.6
CODESIM: Multi-Agent Code Generation and Problem Solving through Simulation-Driven Planning and Debugging
MGDebugger (DeepSeek-Coder-V2-Lite)
96.3
From Code to Correctness: Closing the Last Mile of Code Generation with Hierarchical Debugging
AgentCoder (GPT-4)
96.3
AgentCoder: Multi-Agent-based Code Generation with Iterative Testing and Optimisation
CodeSim (GPT-4o)
95.1
CODESIM: Multi-Agent Code Generation and Problem Solving through Simulation-Driven Planning and Debugging
AFlow(GPT-4o-mini)
94.7
AFlow: Automating Agentic Workflow Generation
MapCoder (GPT-4)
93.9
MapCoder: Multi-Agent Code Generation for Competitive Problem Solving
Claude 3.5 Sonnet (0-shot)
92.0
-
FractalResearch : Pioneer-SWO (GPT-4-turbo)
91.65
-
L2MAC (GPT-4)
90.2
L2MAC: Large Language Model Automatic Computer for Extensive Code Generation
GPT-4o (0-shot)
90.2
Claude 3.5 Sonnet Model Card Addendum
OctorCoder (GPT-4)
86.6
OctoPack: Instruction Tuning Code Large Language Models
Spark_FP16_medium_v4.1.1
85.97
-
0 of 21 row(s) selected.
Previous
Next
HyperAI
HyperAI
Home
Console
Docs
News
Papers
Tutorials
Datasets
Wiki
SOTA
LLM Models
GPU Leaderboard
Events
Search
About
Terms of Service
Privacy Policy
English
HyperAI
HyperAI
Toggle Sidebar
Search the site…
⌘
K
Command Palette
Search for a command to run...
Console
Home
SOTA
Code Generation
Code Generation On Humaneval
Code Generation On Humaneval
Metrics
Pass@1
Results
Performance results of various models on this benchmark
Columns
Model Name
Pass@1
Paper Title
Llama-3 8B (HPT)
100
Hierarchical Prompting Taxonomy: A Universal Evaluation Framework for Large Language Models
Claude 3.5 Sonnet (HPT)
100
Hierarchical Prompting Taxonomy: A Universal Evaluation Framework for Large Language Models
LLMDebugger (OpenAI o1)
99.4
Debug like a Human: A Large Language Model Debugger via Verifying Runtime Execution Step-by-step
CodeSim (o3-mini)
98.8
CODESIM: Multi-Agent Code Generation and Problem Solving through Simulation-Driven Planning and Debugging
QualityFlow (Sonnet-3.5)
98.8
QualityFlow: An Agentic Workflow for Program Synthesis Controlled by LLM Quality Checks
Nexus (Claude 3.5 Sonnet)
98.8
Nexus: A Lightweight and Scalable Multi-Agent Framework for Complex Tasks Automation
LLMDebugger (GPT 4o)
98.2
Debug like a Human: A Large Language Model Debugger via Verifying Runtime Execution Step-by-step
LPW (GPT-4o)
98.2
Planning-Driven Programming: A Large Language Model Programming Workflow
CodeSim (GPT-4o and LDB Debugger )
97.6
CODESIM: Multi-Agent Code Generation and Problem Solving through Simulation-Driven Planning and Debugging
MGDebugger (DeepSeek-Coder-V2-Lite)
96.3
From Code to Correctness: Closing the Last Mile of Code Generation with Hierarchical Debugging
AgentCoder (GPT-4)
96.3
AgentCoder: Multi-Agent-based Code Generation with Iterative Testing and Optimisation
CodeSim (GPT-4o)
95.1
CODESIM: Multi-Agent Code Generation and Problem Solving through Simulation-Driven Planning and Debugging
AFlow(GPT-4o-mini)
94.7
AFlow: Automating Agentic Workflow Generation
MapCoder (GPT-4)
93.9
MapCoder: Multi-Agent Code Generation for Competitive Problem Solving
Claude 3.5 Sonnet (0-shot)
92.0
-
FractalResearch : Pioneer-SWO (GPT-4-turbo)
91.65
-
L2MAC (GPT-4)
90.2
L2MAC: Large Language Model Automatic Computer for Extensive Code Generation
GPT-4o (0-shot)
90.2
Claude 3.5 Sonnet Model Card Addendum
OctorCoder (GPT-4)
86.6
OctoPack: Instruction Tuning Code Large Language Models
Spark_FP16_medium_v4.1.1
85.97
-
0 of 21 row(s) selected.
Previous
Next
Code Generation On Humaneval | SOTA | HyperAI