Search for a command to run...
RubricEM: Meta-RL with Rubric-guided Policy Decomposition beyond Verifiable Rewards