Date

2 months ago

Organization

Paper URL

Tags

HiPO (Hybrid Policy Optimization) was proposed in September 2025 by a research team from Kuaishou and Nanjing University. The relevant research findings were published in the paper "HiPO: Hybrid Policy Optimization for Dynamic Reasoning in LLMs".

HiPO is a framework for adaptive inference control that enables LLMs to selectively decide when to perform detailed inference (Think-on) and when to provide direct responses (Think-off). Specifically, HiPO combines a hybrid data pipeline that provides paired Think-on and Think-off responses with a hybrid reinforcement learning reward system that avoids over-reliance on detailed inference while balancing accuracy and efficiency. Experiments on math and programming benchmarks demonstrate that HiPO can significantly reduce token length while maintaining or improving accuracy.

Build AI with AI

From idea to launch — accelerate your AI development with free AI co-coding, out-of-the-box environment and best price of GPUs.

AI Co-coding

Ready-to-use GPUs

Best Pricing

Get Started View Pricing

HyperAI Newsletters

Subscribe to our latest updates

We will deliver the latest updates of the week to your inbox at nine o'clock every Monday morning

HyperAI

Date

2 months ago

Organization

Paper URL

2509.23967

Related Wiki

Group Variance Strategy Optimization GVPO

Given the limitations of existing fine-tuning techniques such as GRPO, GVPO has emerged as a reliable and versatile post-training paradigm.

3 months ago

Agent Entropy Balancing Strategy Optimization AEPO

AEPO focuses on balancing and rationalizing strategy extension branches and strategy updates under the guidance of high-entropy tool calls.

2 months ago

MultiPL-MoE Architecture

MultiPL-MoE is an effective method for extending low-source programming languages in the post-pre-training stage.

2 months ago

Discriminative Constraint Optimization Framework (DisCO)

A novel principle-based discriminative constraint optimization framework avoids difficulty bias and training instability.

2 months ago

Multi-agent Workflow CudaForge

CudaForge is a simple, effective, and low-cost multi-agent workflow for CUDA kernel generation and optimization.

2 months ago

DexFlyWheel Data Generation Framework

DexFlyWheel is a scalable and self-improving data generation paradigm for agile operations.

3 months ago

Exponential-Gaussian Mixture Network EGMN

EGMN successfully captured the potential interaction effects between user preferences and video features.

2 months ago

Cache-to-Cache (C2C)

C2C enables direct semantic communication by transforming and fusing key-value (KV) caches between models.

2 months ago

Gated Attention

The Tongyi Qianwen team systematically studied the role of gating mechanisms in standard softmax attention.

2 months ago

Build AI with AI

From idea to launch — accelerate your AI development with free AI co-coding, out-of-the-box environment and best price of GPUs.

AI Co-coding

Ready-to-use GPUs

Best Pricing

Get Started View Pricing

HyperAI Newsletters

Subscribe to our latest updates

We will deliver the latest updates of the week to your inbox at nine o'clock every Monday morning

Command Palette

HiPO Hybrid Strategy Optimization Framework

Build AI with AI

HyperAI Newsletters

Command Palette

HiPO Hybrid Strategy Optimization Framework

Related Wiki

Group Variance Strategy Optimization GVPO

Agent Entropy Balancing Strategy Optimization AEPO

MultiPL-MoE Architecture

Discriminative Constraint Optimization Framework (DisCO)

Multi-agent Workflow CudaForge

DexFlyWheel Data Generation Framework

Exponential-Gaussian Mixture Network EGMN

Cache-to-Cache (C2C)

Gated Attention

Build AI with AI

HyperAI Newsletters

Command Palette

HiPO Hybrid Strategy Optimization Framework

Related Wiki

Group Variance Strategy Optimization GVPO

Agent Entropy Balancing Strategy Optimization AEPO

MultiPL-MoE Architecture

Discriminative Constraint Optimization Framework (DisCO)

Multi-agent Workflow CudaForge

DexFlyWheel Data Generation Framework

Exponential-Gaussian Mixture Network EGMN

Cache-to-Cache (C2C)

Gated Attention

Build AI with AI

HyperAI Newsletters

Related Wiki

Group Variance Strategy Optimization GVPO

Agent Entropy Balancing Strategy Optimization AEPO

MultiPL-MoE Architecture

Discriminative Constraint Optimization Framework (DisCO)

Multi-agent Workflow CudaForge

DexFlyWheel Data Generation Framework

Exponential-Gaussian Mixture Network EGMN

Cache-to-Cache (C2C)

Gated Attention

Related Wiki

Group Variance Strategy Optimization GVPO

Agent Entropy Balancing Strategy Optimization AEPO

MultiPL-MoE Architecture

Discriminative Constraint Optimization Framework (DisCO)

Multi-agent Workflow CudaForge

DexFlyWheel Data Generation Framework

Exponential-Gaussian Mixture Network EGMN

Cache-to-Cache (C2C)

Gated Attention