Search for a command to run...
FIPO: Eliciting Deep Reasoning with Future-KL Influenced Policy Optimization