Search for a command to run...
DelTA: Discriminative Token Credit Assignment for Reinforcement Learning from Verifiable Rewards