HyperAIHyperAI

Command Palette

Search for a command to run...

Agent Entropy Balancing Strategy Optimization AEPO

Date

3 days ago

Organization

Renmin University of China
Kuaishou Technology

Paper URL

2510.14545

Agentic Entropy-Balanced Policy Optimization (AEPO) was proposed in October 2025 by a joint research team from Renmin University of China and Kuaishou. The relevant research findings were published in the paper "[…]".Agentic Entropy-Balanced Policy Optimization".

AEPO is an agent reinforcement learning (RL) algorithm designed to balance entropy during policy unfolding and policy update phases. It consists of two core components: (1) a dynamic entropy balancing extension mechanism that adaptively allocates global and branch sampling budgets through entropy pre-monitoring while imposing branch penalties on successive high-entropy tool call steps to prevent over-branching; and (2) an entropy balancing policy optimization that inserts a stopping gradient operation into high-entropy pruning terms to preserve and appropriately rescale gradients on high-entropy labels, while incorporating entropy-aware advantage estimation to prioritize learning high-uncertainty labels. Results on 14 challenging datasets demonstrate that AEPO consistently outperforms 7 mainstream RL algorithms.

Build AI with AI

From idea to launch — accelerate your AI development with free AI co-coding, out-of-the-box environment and best price of GPUs.

AI Co-coding
Ready-to-use GPUs
Best Pricing
Get Started

Hyper Newsletters

Subscribe to our latest updates
We will deliver the latest updates of the week to your inbox at nine o'clock every Monday morning
Powered by MailChimp