5 个月前

使用双Q学习的深度强化学习

Hado van Hasselt; Arthur Guez; David Silver

摘要

流行的Q学习算法在某些条件下已知会高估动作值。此前，尚不清楚在实际应用中这种高估是否普遍发生，是否会损害性能，以及是否可以普遍预防。本文对这些问题均给出了肯定的回答。具体而言，我们首先展示了最近的DQN算法（结合了Q学习和深度神经网络）在Atari 2600领域的某些游戏中存在显著的高估现象。然后，我们证明了双Q学习算法背后的思想（最初是在表格设置中提出的）可以推广到大规模函数逼近中。我们提出了一种针对DQN算法的具体改进方法，并证明该改进不仅如预期那样减少了观察到的高估现象，还显著提高了多个游戏的性能。

代码仓库

ianlimle/ItsMeMario

pytorch

GitHub 中提及

wtingda/DeepRLBreakout

GitHub 中提及

wmol4/Pytorch_DDQN_Unity_Navigation

pytorch

GitHub 中提及

deepmind/rlax

jax

nbopardi/smb

GitHub 中提及

shehrum/RL_Navigation

pytorch

GitHub 中提及

pathway/alphaxos

GitHub 中提及

Rabrg/dqn

pytorch

GitHub 中提及

hemilpanchiwala/Dueling_Network_Architectures

pytorch

GitHub 中提及

Denbergvanthijs/imbDRL

GitHub 中提及

toni-sm/skrl

jax

aman-khurana/deep-q-learning

GitHub 中提及

jadag/DDQN_mario

GitHub 中提及

daviddcho/supermario

pytorch

GitHub 中提及

dxyang/DQN_pytorch

pytorch

GitHub 中提及

cove9988/TradingGym

GitHub 中提及

utarumo/RL_implementation

GitHub 中提及

zhengant/dqn_reversi

GitHub 中提及

Roman-Kozachek/TradeBot

GitHub 中提及

ifestus/rl

GitHub 中提及

PeterJochem/Double_Deep_QLearning

GitHub 中提及

lbenbaccar/Deep-Reinforcement-Learning-with-Double-Q-Learning

GitHub 中提及

mindspore-courses/Deep-Reinforcement-Learning-Algorithms-with-MindSpore

mindspore

GitHub 中提及

guillaumeboniface/bananaland

pytorch

GitHub 中提及

MaximeVandegar/Papers-in-100-Lines-of-Code/tree/main/Deep_Reinforcement_Learning_with_Double_Q_learning

pytorch

Montherapy/Deep-reinforcement-learning-for-multi-class-imbalanced-classification

GitHub 中提及

JonasRSV/DQN

GitHub 中提及

hill-a/stable-baselines

yukezhu/tensorflow-reinforce

GitHub 中提及

molomono/CartPole_Optimized_DDQN

GitHub 中提及

chainer/chainerrl

pytorch

GitHub 中提及

botforge/simplementation

pytorch

GitHub 中提及

anh-nn01/Lunar-Lander-Double-Deep-Q-Networks

GitHub 中提及

cocolico14/N-step-Dueling-DDQN-PER-Pacman

GitHub 中提及

near32/regym

pytorch

GitHub 中提及

1jsingh/rl_navigation

pytorch

GitHub 中提及

NikolausBerl/Udacity_DRLN_Navigation_Project

pytorch

GitHub 中提及

matthewsparr/Deep-Zork

GitHub 中提及

jezzarax/drlnd_p1_navigation

pytorch

GitHub 中提及

tensorlayer/RLzoo

GitHub 中提及

fengsterooni/dql

pytorch

GitHub 中提及

rybread1/deep-rl-trex

GitHub 中提及

Codernauti/Exploration-of-DQN-in-CartPole-Environment

GitHub 中提及

kshitij-ingale/Reinforcement-Learning

GitHub 中提及

paintception/Deep-Quality-Value-Family-

GitHub 中提及

KelvinYang0320/deepbots-panda

pytorch

GitHub 中提及

hamishs/JAX-RL

jax

GitHub 中提及

kochlisGit/autonomous-vehicles-agent

GitHub 中提及

seacevedo/ReinforcementLearningProjects

pytorch

GitHub 中提及

amirmirzaei79/CartPole-DQN-And-DDQN

pytorch

GitHub 中提及

MohammadAsadolahi/DDQN_Deep-Double_Q-Learning-for-solving-OpenAi-Gym-LunarLander-v2-in-python

GitHub 中提及

Adrelf/DRL-navigation

pytorch

GitHub 中提及

ZainRaza14/deepRL

pytorch

GitHub 中提及

RandyDeng/gym_connect4

GitHub 中提及

PeterJochem/Deep_RL

GitHub 中提及

MEOWMEOW114/nd893-p1-navigation-banana

pytorch

GitHub 中提及

mindspore-courses/Rainbow-MindSpore

mindspore

GitHub 中提及

moduIo/Deep-Q-network

GitHub 中提及

tensorpack/tensorpack/tree/master/examples/DeepQNetwork

GitHub 中提及

kmdanielduan/DQN_Family_PyTorch

pytorch

GitHub 中提及

jvoynow/DQN-analysis-with-2048

GitHub 中提及

JustinStitt/acrobotDDQN

pytorch

GitHub 中提及

rybread1/DeepRlTrex

GitHub 中提及

hemilpanchiwala/Dueling-Network-Architectures

pytorch

GitHub 中提及

tkcoding/Stock_DRL

pytorch

GitHub 中提及

labmlai/annotated_deep_learning_paper_implementations

pytorch

OMS1996/Carla_The_RL_Self-Driving-Car

GitHub 中提及

jihoonerd/Deep-Reinforcement-Learning-with-Double-Q-learning

GitHub 中提及

gznyyb/deep_reinforcement_learning_Pong

GitHub 中提及

Curt-Park/rainbow-is-all-you-need

GitHub 中提及

paintception/Deep-Quality-Value-Family

GitHub 中提及

jeffery1236/Atari_DoubleDeepQNetwork

pytorch

GitHub 中提及

tkcoding/DeepRL

pytorch

GitHub 中提及

opendilab/DI-engine

pytorch

mohit8935/Deep-Q-Learning-Paper

pytorch

GitHub 中提及

JonasRSV/DQNTensorflow

GitHub 中提及

yzheng51/rl-dino-run

pytorch

GitHub 中提及

OscarHuangWind/Preference-Guided-DQN-Atari

pytorch

HussonnoisMaxence/RL_Algorithms

pytorch

GitHub 中提及

facebookresearch/ReAgent

pytorch

GitHub 中提及

mightypirate1/DRL-Tetris

GitHub 中提及

ssainz/reinforcement_learning_algorithms

pytorch

GitHub 中提及

xtma/simple-pytorch-rl

pytorch

GitHub 中提及

puppetect/TradingBot-tensorflow

GitHub 中提及

shashwatsaxena571/DRL-navigation

pytorch

GitHub 中提及

xgfelicia/Reinforcement-Learning

pytorch

GitHub 中提及

austinsilveria/Banana-Collection-DQN

pytorch

GitHub 中提及

YuansongFeng/MadMario

pytorch

GitHub 中提及

chandar-lab/RLHive

pytorch

philtabor/Deep-Q-Learning-Paper-To-Code

pytorch

GitHub 中提及

marload/DeepRL-TensorFlow2

GitHub 中提及

atavakol/action-branching-agents

GitHub 中提及

microsoft/med-deadend

pytorch

GitHub 中提及

SayhoKim/tetrisRL

GitHub 中提及

FaboNo/DRLND

pytorch

GitHub 中提及

yaxinchen666/dce_pricingRL

GitHub 中提及

MOVzeroOne/DQN

pytorch

GitHub 中提及

基准测试

基准	方法	指标
atari-games-on-atari-2600-alien	DDQN (tuned) hs	Score: 1033.4
atari-games-on-atari-2600-alien	DQN hs	Score: 634.0
atari-games-on-atari-2600-alien	Prior+Duel hs	Score: 823.7
atari-games-on-atari-2600-alien	DQN noop	Score: 1620.0
atari-games-on-atari-2600-amidar	Prior+Duel hs	Score: 238.4
atari-games-on-atari-2600-amidar	DQN noop	Score: 978.0
atari-games-on-atari-2600-amidar	DDQN (tuned) hs	Score: 169.1
atari-games-on-atari-2600-amidar	DQN hs	Score: 178.4
atari-games-on-atari-2600-assault	DQN noop	Score: 4280.4
atari-games-on-atari-2600-assault	DDQN (tuned) hs	Score: 6060.8
atari-games-on-atari-2600-assault	Prior+Duel hs	Score: 10950.6
atari-games-on-atari-2600-assault	DQN hs	Score: 3489.3
atari-games-on-atari-2600-asterix	DDQN (tuned) hs	Score: 16837.0
atari-games-on-atari-2600-asterix	DQN hs	Score: 3170.5
atari-games-on-atari-2600-asterix	Prior+Duel hs	Score: 364200.0
atari-games-on-atari-2600-asterix	DQN noop	Score: 4359.0
atari-games-on-atari-2600-asteroids	DQN noop	Score: 1364.5
atari-games-on-atari-2600-asteroids	Prior+Duel hs	Score: 1021.9
atari-games-on-atari-2600-asteroids	DQN hs	Score: 1458.7
atari-games-on-atari-2600-asteroids	DDQN (tuned) hs	Score: 1193.2
atari-games-on-atari-2600-atlantis	DDQN (tuned) hs	Score: 319688.0
atari-games-on-atari-2600-atlantis	DQN hs	Score: 292491.0
atari-games-on-atari-2600-atlantis	DQN noop	Score: 279987.0
atari-games-on-atari-2600-atlantis	Prior+Duel hs	Score: 423252.0
atari-games-on-atari-2600-bank-heist	Prior+Duel hs	Score: 1004.6
atari-games-on-atari-2600-bank-heist	DQN hs	Score: 312.7
atari-games-on-atari-2600-bank-heist	DDQN (tuned) hs	Score: 886.0
atari-games-on-atari-2600-bank-heist	DQN noop	Score: 455.0
atari-games-on-atari-2600-battle-zone	Prior+Duel hs	Score: 30650.0
atari-games-on-atari-2600-battle-zone	DQN noop	Score: 29900.0
atari-games-on-atari-2600-battle-zone	DQN hs	Score: 23750.0
atari-games-on-atari-2600-battle-zone	DDQN (tuned) hs	Score: 24740.0
atari-games-on-atari-2600-beam-rider	DQN hs	Score: 9743.2
atari-games-on-atari-2600-beam-rider	Prior+Duel hs	Score: 37412.2
atari-games-on-atari-2600-beam-rider	DDQN (tuned) hs	Score: 17417.2
atari-games-on-atari-2600-beam-rider	DQN noop	Score: 8627.5
atari-games-on-atari-2600-berzerk	DDQN (tuned) hs	Score: 1011.1
atari-games-on-atari-2600-berzerk	DQN noop	Score: 585.6
atari-games-on-atari-2600-berzerk	Prior+Duel hs	Score: 2178.6
atari-games-on-atari-2600-berzerk	DQN hs	Score: 493.4
atari-games-on-atari-2600-bowling	DDQN (tuned) hs	Score: 69.6
atari-games-on-atari-2600-bowling	DQN noop	Score: 50.4
atari-games-on-atari-2600-bowling	Prior+Duel hs	Score: 50.4
atari-games-on-atari-2600-bowling	DQN hs	Score: 56.5
atari-games-on-atari-2600-boxing	DQN hs	Score: 70.3
atari-games-on-atari-2600-boxing	DDQN (tuned) hs	Score: 73.5
atari-games-on-atari-2600-boxing	Prior+Duel hs	Score: 79.2
atari-games-on-atari-2600-boxing	DQN noop	Score: 88.0
atari-games-on-atari-2600-breakout	Prior+Duel hs	Score: 354.6
atari-games-on-atari-2600-breakout	DQN hs	Score: 354.5
atari-games-on-atari-2600-breakout	DQN noop	Score: 385.5
atari-games-on-atari-2600-breakout	DDQN (tuned) hs	Score: 368.9
atari-games-on-atari-2600-centipede	DQN noop	Score: 4657.7
atari-games-on-atari-2600-centipede	Prior+Duel hs	Score: 5570.2
atari-games-on-atari-2600-centipede	DQN hs	Score: 3973.9
atari-games-on-atari-2600-centipede	DDQN (tuned) hs	Score: 3853.5
atari-games-on-atari-2600-chopper-command	DQN noop	Score: 6126.0
atari-games-on-atari-2600-chopper-command	Prior+Duel hs	Score: 8058.0
atari-games-on-atari-2600-chopper-command	DQN hs	Score: 5017.0
atari-games-on-atari-2600-chopper-command	DDQN (tuned) hs	Score: 3495.0
atari-games-on-atari-2600-crazy-climber	DQN noop	Score: 110763.0
atari-games-on-atari-2600-crazy-climber	DDQN (tuned) hs	Score: 113782.0
atari-games-on-atari-2600-crazy-climber	DQN hs	Score: 98128.0
atari-games-on-atari-2600-crazy-climber	Prior+Duel hs	Score: 127853.0
atari-games-on-atari-2600-demon-attack	DQN hs	Score: 12550.7
atari-games-on-atari-2600-demon-attack	DQN noop	Score: 12149.4
atari-games-on-atari-2600-demon-attack	DDQN (tuned) hs	Score: 69803.4
atari-games-on-atari-2600-demon-attack	Prior+Duel hs	Score: 73371.3
atari-games-on-atari-2600-double-dunk	DDQN (tuned) hs	Score: -0.3
atari-games-on-atari-2600-double-dunk	Prior+Duel hs	Score: -10.7
atari-games-on-atari-2600-double-dunk	DQN noop	Score: -6.6
atari-games-on-atari-2600-double-dunk	DQN hs	Score: -6.0
atari-games-on-atari-2600-enduro	DQN hs	Score: 626.7
atari-games-on-atari-2600-enduro	DQN noop	Score: 729.0
atari-games-on-atari-2600-enduro	DDQN (tuned) hs	Score: 1216.6
atari-games-on-atari-2600-enduro	Prior+Duel hs	Score: 2223.9
atari-games-on-atari-2600-fishing-derby	DQN noop	Score: -4.9
atari-games-on-atari-2600-fishing-derby	DDQN (tuned) hs	Score: 3.2
atari-games-on-atari-2600-fishing-derby	DQN hs	Score: -1.6
atari-games-on-atari-2600-fishing-derby	Prior+Duel hs	Score: 17.0
atari-games-on-atari-2600-freeway	DDQN (tuned) hs	Score: 28.8
atari-games-on-atari-2600-freeway	Prior+Duel hs	Score: 28.2
atari-games-on-atari-2600-freeway	DQN hs	Score: 26.9
atari-games-on-atari-2600-freeway	DQN noop	Score: 30.8
atari-games-on-atari-2600-frostbite	DQN noop	Score: 797.4
atari-games-on-atari-2600-frostbite	Prior+Duel hs	Score: 4038.4
atari-games-on-atari-2600-frostbite	DQN hs	Score: 496.1
atari-games-on-atari-2600-frostbite	DDQN (tuned) hs	Score: 1448.1
atari-games-on-atari-2600-gopher	DQN noop	Score: 8777.4
atari-games-on-atari-2600-gopher	DDQN (tuned) hs	Score: 15253.0
atari-games-on-atari-2600-gopher	Prior+Duel hs	Score: 105148.4
atari-games-on-atari-2600-gopher	DQN hs	Score: 8190.4
atari-games-on-atari-2600-gravitar	DQN hs	Score: 298.0
atari-games-on-atari-2600-gravitar	DQN noop	Score: 473.0
atari-games-on-atari-2600-gravitar	DDQN (tuned) hs	Score: 200.5
atari-games-on-atari-2600-gravitar	Prior+Duel hs	Score: 167.0
atari-games-on-atari-2600-hero	DQN noop	Score: 20437.8
atari-games-on-atari-2600-hero	Prior+Duel hs	Score: 15459.2
atari-games-on-atari-2600-hero	DQN hs	Score: 14992.9
atari-games-on-atari-2600-hero	DDQN (tuned) hs	Score: 14892.5
atari-games-on-atari-2600-ice-hockey	Prior+Duel hs	Score: 0.5
atari-games-on-atari-2600-ice-hockey	DQN hs	Score: -1.6
atari-games-on-atari-2600-ice-hockey	DQN noop	Score: -1.9
atari-games-on-atari-2600-ice-hockey	DDQN (tuned) hs	Score: -2.5
atari-games-on-atari-2600-james-bond	DQN hs	Score: 697.5
atari-games-on-atari-2600-james-bond	DQN noop	Score: 768.5
atari-games-on-atari-2600-james-bond	Prior+Duel hs	Score: 585.0
atari-games-on-atari-2600-james-bond	DDQN (tuned) hs	Score: 573.0
atari-games-on-atari-2600-kangaroo	DQN hs	Score: 4496.0
atari-games-on-atari-2600-kangaroo	DDQN (tuned) hs	Score: 11204.0
atari-games-on-atari-2600-kangaroo	Prior+Duel hs	Score: 861.0
atari-games-on-atari-2600-kangaroo	DQN noop	Score: 7259.0
atari-games-on-atari-2600-krull	Prior+Duel hs	Score: 7658.6
atari-games-on-atari-2600-krull	DQN noop	Score: 8422.3
atari-games-on-atari-2600-krull	DDQN (tuned) hs	Score: 6796.1
atari-games-on-atari-2600-krull	DQN hs	Score: 6206.0
atari-games-on-atari-2600-kung-fu-master	DQN hs	Score: 20882.0
atari-games-on-atari-2600-kung-fu-master	DDQN (tuned) hs	Score: 30207.0
atari-games-on-atari-2600-kung-fu-master	DQN noop	Score: 26059.0
atari-games-on-atari-2600-kung-fu-master	Prior+Duel hs	Score: 37484.0
atari-games-on-atari-2600-montezumas-revenge	Prior+Duel hs	Score: 24.0
atari-games-on-atari-2600-montezumas-revenge	DQN hs	Score: 47.0
atari-games-on-atari-2600-montezumas-revenge	DDQN (tuned) hs	Score: 42.0
atari-games-on-atari-2600-ms-pacman	DQN noop	Score: 3085.6
atari-games-on-atari-2600-ms-pacman	DDQN (tuned) hs	Score: 1241.3
atari-games-on-atari-2600-ms-pacman	Prior+Duel hs	Score: 1007.8
atari-games-on-atari-2600-ms-pacman	DQN hs	Score: 1092.3
atari-games-on-atari-2600-name-this-game	DQN noop	Score: 8207.8
atari-games-on-atari-2600-name-this-game	Prior+Duel hs	Score: 13637.9
atari-games-on-atari-2600-name-this-game	DDQN (tuned) hs	Score: 8960.3
atari-games-on-atari-2600-name-this-game	DQN hs	Score: 6738.8
atari-games-on-atari-2600-pong	DQN noop	Score: 19.5
atari-games-on-atari-2600-pong	DQN hs	Score: 18.0
atari-games-on-atari-2600-pong	Prior+Duel hs	Score: 18.4
atari-games-on-atari-2600-pong	DDQN (tuned) hs	Score: 19.1
atari-games-on-atari-2600-private-eye	DDQN (tuned) hs	Score: -575.5
atari-games-on-atari-2600-private-eye	DQN noop	Score: 146.7
atari-games-on-atari-2600-private-eye	Prior+Duel hs	Score: 1277.6
atari-games-on-atari-2600-private-eye	DQN hs	Score: 207.9
atari-games-on-atari-2600-qbert	Prior+Duel hs	Score: 14063.0
atari-games-on-atari-2600-qbert	DQN hs	Score: 9271.5
atari-games-on-atari-2600-qbert	DDQN (tuned) hs	Score: 11020.8
atari-games-on-atari-2600-qbert	DQN noop	Score: 13117.3
atari-games-on-atari-2600-river-raid	DQN hs	Score: 4748.5
atari-games-on-atari-2600-river-raid	DDQN (tuned) hs	Score: 10838.4
atari-games-on-atari-2600-river-raid	DQN noop	Score: 7377.6
atari-games-on-atari-2600-river-raid	Prior+Duel hs	Score: 16496.8
atari-games-on-atari-2600-road-runner	DQN hs	Score: 35215.0
atari-games-on-atari-2600-road-runner	Prior+Duel hs	Score: 54630.0
atari-games-on-atari-2600-road-runner	DDQN (tuned) hs	Score: 43156.0
atari-games-on-atari-2600-road-runner	DQN noop	Score: 39544.0
atari-games-on-atari-2600-robotank	DQN hs	Score: 58.7
atari-games-on-atari-2600-robotank	Prior+Duel hs	Score: 24.7
atari-games-on-atari-2600-robotank	DDQN (tuned) hs	Score: 59.1
atari-games-on-atari-2600-robotank	DQN noop	Score: 63.9
atari-games-on-atari-2600-seaquest	DQN hs	Score: 4216.7
atari-games-on-atari-2600-seaquest	DDQN (tuned) hs	Score: 14498.0
atari-games-on-atari-2600-seaquest	DQN noop	Score: 5860.6
atari-games-on-atari-2600-seaquest	Prior+Duel hs	Score: 1431.2
atari-games-on-atari-2600-space-invaders	DQN noop	Score: 1692.3
atari-games-on-atari-2600-space-invaders	Prior+Duel hs	Score: 8978.0
atari-games-on-atari-2600-space-invaders	DQN hs	Score: 1293.8
atari-games-on-atari-2600-space-invaders	DDQN (tuned) hs	Score: 2628.7
atari-games-on-atari-2600-star-gunner	DDQN (tuned) hs	Score: 58365.0
atari-games-on-atari-2600-star-gunner	DQN noop	Score: 54282.0
atari-games-on-atari-2600-star-gunner	DQN hs	Score: 52970.0
atari-games-on-atari-2600-star-gunner	Prior+Duel hs	Score: 127073.0
atari-games-on-atari-2600-tennis	DQN noop	Score: 12.2
atari-games-on-atari-2600-tennis	DQN hs	Score: 11.1
atari-games-on-atari-2600-tennis	Prior+Duel hs	Score: -13.2
atari-games-on-atari-2600-tennis	DDQN (tuned) hs	Score: -7.8
atari-games-on-atari-2600-time-pilot	Prior+Duel hs	Score: 4871.0
atari-games-on-atari-2600-time-pilot	DQN noop	Score: 4870.0
atari-games-on-atari-2600-time-pilot	DDQN (tuned) hs	Score: 6608.0
atari-games-on-atari-2600-time-pilot	DQN hs	Score: 4786.0
atari-games-on-atari-2600-tutankham	DQN hs	Score: 45.6
atari-games-on-atari-2600-tutankham	DDQN (tuned) hs	Score: 92.2
atari-games-on-atari-2600-tutankham	DQN noop	Score: 68.1
atari-games-on-atari-2600-tutankham	Prior+Duel hs	Score: 108.6
atari-games-on-atari-2600-up-and-down	DQN noop	Score: 9989.9
atari-games-on-atari-2600-up-and-down	DQN hs	Score: 8038.5
atari-games-on-atari-2600-up-and-down	Prior+Duel hs	Score: 22681.3
atari-games-on-atari-2600-up-and-down	DDQN (tuned) hs	Score: 19086.9
atari-games-on-atari-2600-venture	DQN hs	Score: 136.0
atari-games-on-atari-2600-venture	DDQN (tuned) hs	Score: 21.0
atari-games-on-atari-2600-venture	Prior+Duel hs	Score: 29.0
atari-games-on-atari-2600-venture	DQN noop	Score: 163.0
atari-games-on-atari-2600-video-pinball	DQN hs	Score: 154414.1
atari-games-on-atari-2600-video-pinball	Prior+Duel hs	Score: 447408.6
atari-games-on-atari-2600-video-pinball	DDQN (tuned) hs	Score: 367823.7
atari-games-on-atari-2600-video-pinball	DQN noop	Score: 196760.4
atari-games-on-atari-2600-wizard-of-wor	DQN noop	Score: 2704.0
atari-games-on-atari-2600-wizard-of-wor	DDQN (tuned) hs	Score: 6201.0
atari-games-on-atari-2600-wizard-of-wor	DQN hs	Score: 1609.0
atari-games-on-atari-2600-wizard-of-wor	Prior+Duel hs	Score: 10471.0
atari-games-on-atari-2600-zaxxon	DQN hs	Score: 4412.0
atari-games-on-atari-2600-zaxxon	Prior+Duel hs	Score: 11320.0
atari-games-on-atari-2600-zaxxon	DDQN (tuned) hs	Score: 8593.0
atari-games-on-atari-2600-zaxxon	DQN noop	Score: 5363.0

用 AI 构建 AI

从想法到上线——通过免费 AI 协同编程、开箱即用的环境和市场最优价格的 GPU 加速您的 AI 开发

AI 协同编程

即用型 GPU

最优价格

立即开始

Hyper Newsletters

订阅我们的最新资讯

我们会在北京时间 每周一的上午九点 向您的邮箱投递本周内的最新更新

邮件发送服务由 MailChimp 提供

Command Palette

使用双Q学习的深度强化学习

Hado van Hasselt; Arthur Guez; David Silver

摘要

代码仓库

基准测试

用 AI 构建 AI

Hyper Newsletters

Command Palette

使用双Q学习的深度强化学习

Hado van Hasselt; Arthur Guez; David Silver

摘要

代码仓库

基准测试