HyperAIHyperAI

Command Palette

Search for a command to run...

HiPO Hybrid Strategy Optimization Framework

Date

3 days ago

Organization

Nanjing University
Kuaishou Technology

Paper URL

2509.23967

Tags

HiPO (Hybrid Policy Optimization) was proposed in September 2025 by a research team from Kuaishou and Nanjing University. The relevant research findings were published in the paper "HiPO: Hybrid Policy Optimization for Dynamic Reasoning in LLMs".

HiPO is a framework for adaptive inference control that enables LLMs to selectively decide when to perform detailed inference (Think-on) and when to provide direct responses (Think-off). Specifically, HiPO combines a hybrid data pipeline that provides paired Think-on and Think-off responses with a hybrid reinforcement learning reward system that avoids over-reliance on detailed inference while balancing accuracy and efficiency. Experiments on math and programming benchmarks demonstrate that HiPO can significantly reduce token length while maintaining or improving accuracy.

Framework Diagram

Build AI with AI

From idea to launch — accelerate your AI development with free AI co-coding, out-of-the-box environment and best price of GPUs.

AI Co-coding
Ready-to-use GPUs
Best Pricing
Get Started

Hyper Newsletters

Subscribe to our latest updates
We will deliver the latest updates of the week to your inbox at nine o'clock every Monday morning
Powered by MailChimp