5 个月前

深度强化学习的异步方法

Volodymyr Mnih; Adrià Puigdomènech Badia; Mehdi Mirza; Alex Graves; Timothy P. Lillicrap; Tim Harley; David Silver; Koray Kavukcuoglu

摘要

我们提出了一种概念上简单且轻量级的深度强化学习框架，该框架利用异步梯度下降来优化深度神经网络控制器。我们介绍了四种标准强化学习算法的异步变体，并展示了并行的行为者-学习者对训练具有稳定作用，使得所有四种方法都能成功地训练神经网络控制器。表现最佳的方法是一种异步变体的演员-评论家（actor-critic）算法，它在Atari游戏领域超越了当前的最先进水平，同时仅使用单个多核CPU而非GPU进行训练，时间缩短了一半。此外，我们还证明了异步演员-评论家算法在广泛的连续运动控制问题以及一项新的任务——使用视觉输入导航随机3D迷宫中也取得了成功。

代码仓库

wtingda/DeepRLBreakout

GitHub 中提及

ShibiHe/Q-Optimality-Tightening

GitHub 中提及

Kaixhin/ACER

pytorch

GitHub 中提及

hulanwin/A3C-DRL

GitHub 中提及

nvlabs/gbrl_sb3

pytorch

GitHub 中提及

miyosuda/async_deep_reinforce

GitHub 中提及

pytorch/rl/tree/main/examples/a2c

jax

AI-RG/rl-experiments

GitHub 中提及

wxj77/TransferReinforcementLearning

GitHub 中提及

muupan/async-rl

GitHub 中提及

bkhmsi/meta-rl-harlow

pytorch

GitHub 中提及

ray-project/ray/tree/master/rllib

toni-sm/skrl

jax

amanda-lambda/hack-flappy-bird-drl

pytorch

GitHub 中提及

ofekluis/sonic_project_ss19

GitHub 中提及

dickreuter/neuron_poker

GitHub 中提及

MatheusMRFM/A3C-LSTM-with-Tensorflow

deepsense-ai/Distributed-BA3C

GitHub 中提及

aabbeell/reinforcementLearning.a2c.gym

GitHub 中提及

avillemin/Minecraft-AI

pytorch

GitHub 中提及

marload/deep-rl-tf2

GitHub 中提及

alexmlamb/blocks_rl_gru_setup

pytorch

GitHub 中提及

Kaixhin/NoisyNet-A3C

pytorch

GitHub 中提及

N0r9st/a2c-jax

jax

joshiatul/game_playing

GitHub 中提及

Khrylx/PyTorch-RL

pytorch

GitHub 中提及

uvipen/Super-mario-bros-A3C-pytorch

pytorch

GitHub 中提及

mavischer/DRRL

pytorch

GitHub 中提及

gungui98/deeprl-a3c-ai2thor

GitHub 中提及

Nasdin/ReinforcementLearning-AtariGame

pytorch

GitHub 中提及

amanda-lambda/drl-experiments

pytorch

GitHub 中提及

yukezhu/tensorflow-reinforce

GitHub 中提及

hill-a/stable-baselines

grananqvist/reinforcement-learning-super-mario-A3C

GitHub 中提及

chainer/chainerrl

pytorch

GitHub 中提及

JulT1/RL_SS19

GitHub 中提及

Remtasya/DDPG-Actor-Critic-Reinforcement-Learning-Reacher-Environment

pytorch

GitHub 中提及

Zartris/TD3_continuous_control

pytorch

GitHub 中提及

Jzar/Space-Invaders-DQN

GitHub 中提及

Sheepsody/Batched-Impala-PyTorch

pytorch

GitHub 中提及

ikostrikov/pytorch-rl

pytorch

GitHub 中提及

roop-pal/Meta-Learning-for-StarCraft-II-Minigames

GitHub 中提及

vladfi1/universe-starter-agent

GitHub 中提及

PaulCharnay/Projet_AIF

GitHub 中提及

ikostrikov/pytorch-a3c

pytorch

GitHub 中提及

tensorlayer/RLzoo

GitHub 中提及

liuyuezhang/pyrl

pytorch

GitHub 中提及

danielpolimac/Ispit_Inteligentni_Agenti

GitHub 中提及

sainijagjit/A3C-Pytorch

pytorch

GitHub 中提及

dsinghnegi/atari_RL_agent

pytorch

GitHub 中提及

braemt/attentive-multi-task-deep-reinforcement-learning

GitHub 中提及

brett-daley/fast-dqn

GitHub 中提及

qihongl/demo-advantage-actor-critic

pytorch

GitHub 中提及

4rChon/NL-FuN

GitHub 中提及

lcswillems/torch-ac

pytorch

GitHub 中提及

InSpaceAI/RL-Zoo

GitHub 中提及

arnomoonens/yarll

khanhptnk/bandit-nmt

pytorch

GitHub 中提及

openai/universe-starter-agent

GitHub 中提及

amaudruz/RL_openaigym

pytorch

GitHub 中提及

bentrevett/pytorch-rl

pytorch

GitHub 中提及

qihongl/dlstm-demo

pytorch

GitHub 中提及

GitHub 中提及

GitHub 中提及

GitHub 中提及

GitHub 中提及

marload/DeepRL-TensorFlow2

GitHub 中提及

tensorpack/tensorpack/tree/master/examples/A3C-Gym

GitHub 中提及

DLR-RM/stable-baselines3

pytorch

cdesilv1/sc2_ai_cdes

GitHub 中提及

基准测试

基准	方法	指标
atari-games-on-atari-2600-alien	A3C LSTM hs	Score: 945.3
atari-games-on-atari-2600-alien	A3C FF hs	Score: 518.4
atari-games-on-atari-2600-alien	A3C FF (1 day) hs	Score: 182.1
atari-games-on-atari-2600-amidar	A3C FF (1 day) hs	Score: 283.9
atari-games-on-atari-2600-amidar	A3C LSTM hs	Score: 173.0
atari-games-on-atari-2600-amidar	A3C FF hs	Score: 263.9
atari-games-on-atari-2600-assault	A3C LSTM hs	Score: 14497.9
atari-games-on-atari-2600-assault	A3C FF hs	Score: 5474.9
atari-games-on-atari-2600-assault	A3C FF (1 day) hs	Score: 3746.1
atari-games-on-atari-2600-asterix	A3C FF hs	Score: 22140.5
atari-games-on-atari-2600-asterix	A3C LSTM hs	Score: 17244.5
atari-games-on-atari-2600-asterix	A3C FF (1 day) hs	Score: 6723
atari-games-on-atari-2600-asteroids	A3C LSTM hs	Score: 5093.1
atari-games-on-atari-2600-asteroids	A3C FF (1 day) hs	Score: 3009.4
atari-games-on-atari-2600-asteroids	A3C FF hs	Score: 4474.5
atari-games-on-atari-2600-atlantis	A3C LSTM hs	Score: 875822.0
atari-games-on-atari-2600-atlantis	A3C FF hs	Score: 911091.0
atari-games-on-atari-2600-atlantis	A3C FF (1 day) hs	Score: 772392.0
atari-games-on-atari-2600-bank-heist	A3C LSTM hs	Score: 932.8
atari-games-on-atari-2600-bank-heist	A3C FF (1 day) hs	Score: 946.0
atari-games-on-atari-2600-bank-heist	A3C FF hs	Score: 970.1
atari-games-on-atari-2600-battle-zone	A3C FF hs	Score: 12950.0
atari-games-on-atari-2600-battle-zone	A3C FF (1 day) hs	Score: 11340.0
atari-games-on-atari-2600-battle-zone	A3C LSTM hs	Score: 20760.0
atari-games-on-atari-2600-beam-rider	A3C LSTM hs	Score: 24622.2
atari-games-on-atari-2600-beam-rider	A3C FF (1 day) hs	Score: 13235.9
atari-games-on-atari-2600-beam-rider	A3C FF hs	Score: 22707.9
atari-games-on-atari-2600-berzerk	A3C FF (1 day) hs	Score: 1433.4
atari-games-on-atari-2600-berzerk	A3C FF hs	Score: 817.9
atari-games-on-atari-2600-berzerk	A3C LSTM hs	Score: 862.2
atari-games-on-atari-2600-bowling	A3C LSTM hs	Score: 41.8
atari-games-on-atari-2600-bowling	A3C FF hs	Score: 35.1
atari-games-on-atari-2600-bowling	A3C FF (1 day) hs	Score: 36.2
atari-games-on-atari-2600-boxing	A3C LSTM hs	Score: 37.3
atari-games-on-atari-2600-boxing	A3C FF hs	Score: 59.8
atari-games-on-atari-2600-boxing	A3C FF (1 day) hs	Score: 33.7
atari-games-on-atari-2600-breakout	A3C FF (1 day) hs	Score: 551.6
atari-games-on-atari-2600-breakout	A3C LSTM hs	Score: 766.8
atari-games-on-atari-2600-breakout	A3C FF hs	Score: 681.9
atari-games-on-atari-2600-centipede	A3C FF (1 day) hs	Score: 3306.5
atari-games-on-atari-2600-centipede	A3C LSTM hs	Score: 1997.0
atari-games-on-atari-2600-centipede	A3C FF hs	Score: 3755.8
atari-games-on-atari-2600-chopper-command	A3C LSTM hs	Score: 10150.0
atari-games-on-atari-2600-chopper-command	A3C FF (1 day) hs	Score: 4669.0
atari-games-on-atari-2600-chopper-command	A3C FF hs	Score: 7021.0
atari-games-on-atari-2600-crazy-climber	A3C FF (1 day) hs	Score: 101624.0
atari-games-on-atari-2600-crazy-climber	A3C FF hs	Score: 112646.0
atari-games-on-atari-2600-crazy-climber	A3C LSTM hs	Score: 138518.0
atari-games-on-atari-2600-demon-attack	A3C FF (1 day) hs	Score: 84997.5
atari-games-on-atari-2600-demon-attack	A3C LSTM hs	Score: 115201.9
atari-games-on-atari-2600-demon-attack	A3C FF hs	Score: 113308.4
atari-games-on-atari-2600-double-dunk	A3C FF (1 day) hs	Score: 0.1
atari-games-on-atari-2600-double-dunk	A3C FF hs	Score: -0.1
atari-games-on-atari-2600-double-dunk	A3C LSTM hs	Score: 0.1
atari-games-on-atari-2600-enduro	A3C FF hs	Score: -82.5
atari-games-on-atari-2600-enduro	A3C LSTM hs	Score: -82.5
atari-games-on-atari-2600-enduro	A3C FF (1 day) hs	Score: -82.2
atari-games-on-atari-2600-fishing-derby	A3C FF hs	Score: 18.8
atari-games-on-atari-2600-fishing-derby	A3C LSTM hs	Score: 22.6
atari-games-on-atari-2600-fishing-derby	A3C FF (1 day) hs	Score: 13.6
atari-games-on-atari-2600-freeway	A3C FF (1 day) hs	Score: 0.1
atari-games-on-atari-2600-freeway	A3C FF hs	Score: 0.1
atari-games-on-atari-2600-freeway	A3C LSTM hs	Score: 0.1
atari-games-on-atari-2600-frostbite	A3C LSTM hs	Score: 197.6
atari-games-on-atari-2600-frostbite	A3C FF hs	Score: 190.5
atari-games-on-atari-2600-frostbite	A3C FF (1 day) hs	Score: 180.1
atari-games-on-atari-2600-gopher	A3C FF hs	Score: 10022.8
atari-games-on-atari-2600-gopher	A3C LSTM hs	Score: 17106.8
atari-games-on-atari-2600-gopher	A3C FF (1 day) hs	Score: 8442.8
atari-games-on-atari-2600-gravitar	A3C LSTM hs	Score: 320.0
atari-games-on-atari-2600-gravitar	A3C FF hs	Score: 303.5
atari-games-on-atari-2600-gravitar	A3C FF (1 day) hs	Score: 269.5
atari-games-on-atari-2600-hero	A3C FF hs	Score: 32464.1
atari-games-on-atari-2600-hero	A3C LSTM hs	Score: 28889.5
atari-games-on-atari-2600-hero	A3C FF (1 day) hs	Score: 28765.8
atari-games-on-atari-2600-ice-hockey	A3C LSTM hs	Score: -1.7
atari-games-on-atari-2600-ice-hockey	A3C FF (1 day) hs	Score: -4.7
atari-games-on-atari-2600-ice-hockey	A3C FF hs	Score: -2.8
atari-games-on-atari-2600-james-bond	A3C FF (1 day) hs	Score: 351.5
atari-games-on-atari-2600-james-bond	A3C LSTM hs	Score: 613.0
atari-games-on-atari-2600-james-bond	A3C FF hs	Score: 541.0
atari-games-on-atari-2600-kangaroo	A3C FF hs	Score: 94.0
atari-games-on-atari-2600-kangaroo	A3C FF (1 day) hs	Score: 106.0
atari-games-on-atari-2600-kangaroo	A3C LSTM hs	Score: 125.0
atari-games-on-atari-2600-krull	A3C FF hs	Score: 5560.0
atari-games-on-atari-2600-krull	A3C LSTM hs	Score: 5911.4
atari-games-on-atari-2600-krull	A3C FF (1 day) hs	Score: 8066.6
atari-games-on-atari-2600-kung-fu-master	A3C LSTM hs	Score: 40835.0
atari-games-on-atari-2600-kung-fu-master	A3C FF (1 day) hs	Score: 3046.0
atari-games-on-atari-2600-kung-fu-master	A3C FF hs	Score: 28819.0
atari-games-on-atari-2600-montezumas-revenge	A3C FF (1 day) hs	Score: 53
atari-games-on-atari-2600-montezumas-revenge	A3C FF hs	Score: 67
atari-games-on-atari-2600-montezumas-revenge	A3C LSTM hs	Score: 41
atari-games-on-atari-2600-ms-pacman	A3C FF hs	Score: 653.7
atari-games-on-atari-2600-ms-pacman	A3C LSTM hs	Score: 850.7
atari-games-on-atari-2600-ms-pacman	A3C FF (1 day) hs	Score: 594.4
atari-games-on-atari-2600-name-this-game	A3C LSTM hs	Score: 12093.7
atari-games-on-atari-2600-name-this-game	A3C FF hs	Score: 10476.1
atari-games-on-atari-2600-name-this-game	A3C FF (1 day) hs	Score: 5614.0
atari-games-on-atari-2600-pong	A3C FF (1 day) hs	Score: 11.4
atari-games-on-atari-2600-pong	A3C LSTM hs	Score: 10.7
atari-games-on-atari-2600-pong	A3C FF hs	Score: 5.6
atari-games-on-atari-2600-private-eye	A3C FF hs	Score: 206.9
atari-games-on-atari-2600-private-eye	A3C LSTM hs	Score: 421.1
atari-games-on-atari-2600-private-eye	A3C FF (1 day) hs	Score: 194.4
atari-games-on-atari-2600-qbert	A3C LSTM hs	Score: 21307.5
atari-games-on-atari-2600-qbert	A3C FF hs	Score: 15148.8
atari-games-on-atari-2600-qbert	A3C FF (1 day) hs	Score: 13752.3
atari-games-on-atari-2600-river-raid	A3C LSTM hs	Score: 6591.9
atari-games-on-atari-2600-river-raid	A3C FF hs	Score: 12201.8
atari-games-on-atari-2600-river-raid	A3C FF (1 day) hs	Score: 10001.2
atari-games-on-atari-2600-road-runner	A3C LSTM hs	Score: 73949.0
atari-games-on-atari-2600-road-runner	A3C FF hs	Score: 34216.0
atari-games-on-atari-2600-road-runner	A3C FF (1 day) hs	Score: 31769.0
atari-games-on-atari-2600-robotank	A3C LSTM hs	Score: 2.6
atari-games-on-atari-2600-robotank	A3C FF hs	Score: 32.8
atari-games-on-atari-2600-robotank	A3C FF (1 day) hs	Score: 2.3
atari-games-on-atari-2600-seaquest	A3C FF (1 day) hs	Score: 2300.2
atari-games-on-atari-2600-seaquest	A3C LSTM hs	Score: 1326.1
atari-games-on-atari-2600-seaquest	A3C FF hs	Score: 2355.4
atari-games-on-atari-2600-space-invaders	A3C FF (1 day) hs	Score: 2214.7
atari-games-on-atari-2600-space-invaders	A3C FF hs	Score: 15730.5
atari-games-on-atari-2600-space-invaders	A3C LSTM hs	Score: 23846.0
atari-games-on-atari-2600-star-gunner	A3C FF (1 day) hs	Score: 64393.0
atari-games-on-atari-2600-star-gunner	A3C LSTM hs	Score: 164766.0
atari-games-on-atari-2600-star-gunner	A3C FF hs	Score: 138218.0
atari-games-on-atari-2600-tennis	A3C LSTM hs	Score: -6.4
atari-games-on-atari-2600-tennis	A3C FF hs	Score: -6.3
atari-games-on-atari-2600-tennis	A3C FF (1 day) hs	Score: -10.2
atari-games-on-atari-2600-time-pilot	A3C FF hs	Score: 12679.0
atari-games-on-atari-2600-time-pilot	A3C LSTM hs	Score: 27202.0
atari-games-on-atari-2600-time-pilot	A3C FF (1 day) hs	Score: 5825.0
atari-games-on-atari-2600-tutankham	A3C LSTM hs	Score: 144.2
atari-games-on-atari-2600-tutankham	A3C FF hs	Score: 156.3
atari-games-on-atari-2600-tutankham	A3C FF (1 day) hs	Score: 26.1
atari-games-on-atari-2600-up-and-down	A3C FF hs	Score: 74705.7
atari-games-on-atari-2600-up-and-down	A3C FF (1 day) hs	Score: 54525.4
atari-games-on-atari-2600-up-and-down	A3C LSTM hs	Score: 105728.7
atari-games-on-atari-2600-venture	A3C LSTM hs	Score: 25.0
atari-games-on-atari-2600-venture	A3C FF (1 day) hs	Score: 19.0
atari-games-on-atari-2600-venture	A3C FF hs	Score: 23.0
atari-games-on-atari-2600-video-pinball	A3C FF (1 day) hs	Score: 185852.6
atari-games-on-atari-2600-video-pinball	A3C FF hs	Score: 331628.1
atari-games-on-atari-2600-video-pinball	A3C LSTM hs	Score: 470310.5
atari-games-on-atari-2600-wizard-of-wor	A3C FF (1 day) hs	Score: 5278.0
atari-games-on-atari-2600-wizard-of-wor	A3C LSTM hs	Score: 18082.0
atari-games-on-atari-2600-wizard-of-wor	A3C FF hs	Score: 17244.0
atari-games-on-atari-2600-zaxxon	A3C FF (1 day) hs	Score: 2659.0
atari-games-on-atari-2600-zaxxon	A3C FF hs	Score: 24622.0
atari-games-on-atari-2600-zaxxon	A3C LSTM hs	Score: 23519.0

用 AI 构建 AI

从想法到上线——通过免费 AI 协同编程、开箱即用的环境和市场最优价格的 GPU 加速您的 AI 开发

AI 协同编程

即用型 GPU

最优价格

立即开始

Hyper Newsletters

订阅我们的最新资讯

我们会在北京时间 每周一的上午九点 向您的邮箱投递本周内的最新更新

邮件发送服务由 MailChimp 提供

Command Palette

深度强化学习的异步方法

Volodymyr Mnih; Adrià Puigdomènech Badia; Mehdi Mirza; Alex Graves; Timothy P. Lillicrap; Tim Harley; David Silver; Koray Kavukcuoglu

摘要

代码仓库

基准测试

用 AI 构建 AI

Hyper Newsletters

Command Palette

深度强化学习的异步方法

Volodymyr Mnih; Adrià Puigdomènech Badia; Mehdi Mirza; Alex Graves; Timothy P. Lillicrap; Tim Harley; David Silver; Koray Kavukcuoglu

摘要

代码仓库

基准测试

用 AI 构建 AI

Hyper Newsletters