HyperAI
HyperAI
Home
Console
Docs
News
Papers
Tutorials
Datasets
Wiki
SOTA
LLM Models
GPU Leaderboard
Events
Search
About
Terms of Service
Privacy Policy
English
HyperAI
HyperAI
Toggle Sidebar
Search the site…
⌘
K
Command Palette
Search for a command to run...
Console
Home
SOTA
Code Generation
Code Generation On Mbpp
Code Generation On Mbpp
Metrics
Accuracy
Results
Performance results of various models on this benchmark
Columns
Model Name
Accuracy
Paper Title
QualityFlow (Sonnet-3.5)
94.2
QualityFlow: An Agentic Workflow for Program Synthesis Controlled by LLM Quality Checks
o1-mini + MapCoder
93.2
MapCoder: Multi-Agent Code Generation for Competitive Problem Solving
GPT-4 + AgentCoder
91.8
AgentCoder: Multi-Agent-based Code Generation with Iterative Testing and Optimisation
CodeSim (GPT4o)
90.7
CODESIM: Multi-Agent Code Generation and Problem Solving through Simulation-Driven Planning and Debugging
Jiutian-大模型
90.0
-
GPT-3.5 Turbo (ChatGPT) + AgentCoder
89.9
AgentCoder: Multi-Agent-based Code Generation with Iterative Testing and Optimisation
MapCoder (GPT-4o)
89.7
MapCoder: Multi-Agent Code Generation for Competitive Problem Solving
GPT-4 (ChatGPT Plus)
87.5
How Does Naming Affect LLMs on Code Analysis Tasks?
Claude 3 Opus
86.4
The Claude 3 Model Family: Opus, Sonnet, Haiku
LPW (GPT-4o)
84.8
Planning-Driven Programming: A Large Language Model Programming Workflow
GPT-3.5 Turbo + FlowGenScrum + Test
83.8±0.6
SOEN-101: Code Generation by Emulating Software Process Models Using Large Language Model Agents
AFlow(GPT-4o-mini)
83.4
AFlow: Automating Agentic Workflow Generation
GPT-3.5 Turbo (ChatGPT)
83.2
How Does Naming Affect LLMs on Code Analysis Tasks?
MapCoder (GPT-4)
83.1
MapCoder: Multi-Agent Code Generation for Competitive Problem Solving
o1-mini + Language Agent Tree Search (Hamming.ai)
82.3
Language Agent Tree Search Unifies Reasoning Acting and Planning in Language Models
GPT-4 (Bing Chat)
82
How Does Naming Affect LLMs on Code Analysis Tasks?
GPT-3.5 Turbo + Language Agent Tree Search
81.1
Language Agent Tree Search Unifies Reasoning Acting and Planning in Language Models
MGDebugger (CodeQwen1.5)
80.8
From Code to Correctness: Closing the Last Mile of Code Generation with Hierarchical Debugging
Claude 3 Haiku
80.4
The Claude 3 Model Family: Opus, Sonnet, Haiku
GPT-4 (Self-Debugging with unit tests + trace)
80.2
Teaching Large Language Models to Self-Debug
0 of 96 row(s) selected.
Previous
Next
HyperAI
HyperAI
Home
Console
Docs
News
Papers
Tutorials
Datasets
Wiki
SOTA
LLM Models
GPU Leaderboard
Events
Search
About
Terms of Service
Privacy Policy
English
HyperAI
HyperAI
Toggle Sidebar
Search the site…
⌘
K
Command Palette
Search for a command to run...
Console
Home
SOTA
Code Generation
Code Generation On Mbpp
Code Generation On Mbpp
Metrics
Accuracy
Results
Performance results of various models on this benchmark
Columns
Model Name
Accuracy
Paper Title
QualityFlow (Sonnet-3.5)
94.2
QualityFlow: An Agentic Workflow for Program Synthesis Controlled by LLM Quality Checks
o1-mini + MapCoder
93.2
MapCoder: Multi-Agent Code Generation for Competitive Problem Solving
GPT-4 + AgentCoder
91.8
AgentCoder: Multi-Agent-based Code Generation with Iterative Testing and Optimisation
CodeSim (GPT4o)
90.7
CODESIM: Multi-Agent Code Generation and Problem Solving through Simulation-Driven Planning and Debugging
Jiutian-大模型
90.0
-
GPT-3.5 Turbo (ChatGPT) + AgentCoder
89.9
AgentCoder: Multi-Agent-based Code Generation with Iterative Testing and Optimisation
MapCoder (GPT-4o)
89.7
MapCoder: Multi-Agent Code Generation for Competitive Problem Solving
GPT-4 (ChatGPT Plus)
87.5
How Does Naming Affect LLMs on Code Analysis Tasks?
Claude 3 Opus
86.4
The Claude 3 Model Family: Opus, Sonnet, Haiku
LPW (GPT-4o)
84.8
Planning-Driven Programming: A Large Language Model Programming Workflow
GPT-3.5 Turbo + FlowGenScrum + Test
83.8±0.6
SOEN-101: Code Generation by Emulating Software Process Models Using Large Language Model Agents
AFlow(GPT-4o-mini)
83.4
AFlow: Automating Agentic Workflow Generation
GPT-3.5 Turbo (ChatGPT)
83.2
How Does Naming Affect LLMs on Code Analysis Tasks?
MapCoder (GPT-4)
83.1
MapCoder: Multi-Agent Code Generation for Competitive Problem Solving
o1-mini + Language Agent Tree Search (Hamming.ai)
82.3
Language Agent Tree Search Unifies Reasoning Acting and Planning in Language Models
GPT-4 (Bing Chat)
82
How Does Naming Affect LLMs on Code Analysis Tasks?
GPT-3.5 Turbo + Language Agent Tree Search
81.1
Language Agent Tree Search Unifies Reasoning Acting and Planning in Language Models
MGDebugger (CodeQwen1.5)
80.8
From Code to Correctness: Closing the Last Mile of Code Generation with Hierarchical Debugging
Claude 3 Haiku
80.4
The Claude 3 Model Family: Opus, Sonnet, Haiku
GPT-4 (Self-Debugging with unit tests + trace)
80.2
Teaching Large Language Models to Self-Debug
0 of 96 row(s) selected.
Previous
Next