HyperAIHyperAI

Command Palette

Search for a command to run...

Apex V1.0

Metrics

Mean score

Results

Performance results of various models on this benchmark

Paper Title
GPT-5 (High)64.2%-
Grok-461.3%-
Gemini-2.5-Flash (On)60.4%-
Gemini-2.5-Pro (On)60.1%-
o3-Pro (High)60.0%-
o3 (High)59.9%-
Qwen-3-235B59.8%-
Grok-359.3%-
DeepSeek-R157.6%DeepSeek-R1: Incentivizing Reasoning Capability in LLMs via Reinforcement Learning
GPT-OSS-120B (Medium)57.1%-
o4-mini (High)56.3%-
Opus-4.1 (On)55.3%-
GLM-4.555.1%-
Sonnet-4 (On)54.4%-
Opus-4 (On)53.6%-
Kimi-K2-Instruct51.1%-
Llama-4-Maverick44.7%-
Mistral-Medium-343.0%-
Gemma-3-27B36.6%-
Nova-Pro (CoT)36.3%-
0 of 23 row(s) selected.