Matteo Hessel Ivo Danihelka Fabio Viola Arthur Guez Simon Schmitt Laurent Sifre Theophane Weber David Silver Hado van Hasselt

Abstract
We propose a novel policy update that combines regularized policy optimization with model learning as an auxiliary loss. The update (henceforth Muesli) matches MuZero's state-of-the-art performance on Atari. Notably, Muesli does so without using deep search: it acts directly with a policy network and has computation speed comparable to model-free baselines. The Atari results are complemented by extensive ablations, and by additional results on continuous control and 9x9 Go.
Code Repositories
Benchmarks
| Benchmark | Methodology | Metrics |
|---|---|---|
| atari-games-on-atari-game | Muesli | Human World Record Breakthrough: 5 |
Build AI with AI
From idea to launch — accelerate your AI development with free AI co-coding, out-of-the-box environment and best price of GPUs.