HiPO Hybrid Strategy Optimization Framework
HiPO (Hybrid Policy Optimization) was proposed in September 2025 by a research team from Kuaishou and Nanjing University. The relevant research findings were published in the paper "HiPO: Hybrid Policy Optimization for Dynamic Reasoning in LLMs".
HiPO is a framework for adaptive inference control that enables LLMs to selectively decide when to perform detailed inference (Think-on) and when to provide direct responses (Think-off). Specifically, HiPO combines a hybrid data pipeline that provides paired Think-on and Think-off responses with a hybrid reinforcement learning reward system that avoids over-reliance on detailed inference while balancing accuracy and efficiency. Experiments on math and programming benchmarks demonstrate that HiPO can significantly reduce token length while maintaining or improving accuracy.

Build AI with AI
From idea to launch — accelerate your AI development with free AI co-coding, out-of-the-box environment and best price of GPUs.