Command Palette
Search for a command to run...
Mean Speed Strategy (MVP)
The Mean Velocity Policy (MVP) was jointly proposed by research teams from Tsinghua University (School of Vehicle and Transportation and School of Artificial Intelligence), the BAIR (Baidu Research Laboratory for Artificial Intelligence) at the University of California, Berkeley, and the University of Hong Kong. This work was formally published as a conference paper at the International Conference on Learning Representations (ICLR 2026) in 2026. Related research results were published in the paper "Mean Flow Policy with Instantaneous Velocity Constraint for One-step Action Generation".
MVP is a novel generative policy for reinforcement learning that achieves the fastest single-step action generation by modeling an "average velocity field," completely eliminating the computational overhead of multi-step sampling. To address the challenge of lacking explicit boundary conditions in the model, the research team introduced "instantaneous velocity constraints (IVC)," effectively improving learning accuracy and policy expressiveness. In practical performance, MVP significantly improves training and inference speed (average single-step inference time is only 10.93 milliseconds) and achieves a state-of-the-art average success rate of 0.88 on complex robot manipulation tasks in Robomimic and OGBench, reaching the state-of-the-art in this field.
Build AI with AI
From idea to launch — accelerate your AI development with free AI co-coding, out-of-the-box environment and best price of GPUs.