목차

General Video Game Playing

Learning Algorithm

Reward Shaping

Simulation

Learning by Instruction

??

Deep Reinforcement Learning 알고리즘

Deep Q-Learning V. Mnih et al., “Human-level control through deep reinforcement learning,” Nature, vol. 518, no. 7540, pp. 529–533, Feb. 2015.
Double DQN H. van Hasselt, A. Guez, and D. Silver, “Deep Reinforcement Learning with Double Q-learning,” arXiv:1509.06461 [cs], Sep. 2015.
DDPGT. P. Lillicrap et al., “Continuous control with deep reinforcement learning,” arXiv:1509.02971 [cs, stat], Sep. 2015.
Async. Deep RL V. Mnih et al., “Asynchronous Methods for Deep Reinforcement Learning,” arXiv:1602.01783 [cs], Feb. 2016.

Exploration 개선

Count & Exploration I. Osband, C. Blundell, A. Pritzel, and B. Van Roy, “Deep Exploration via Bootstrapped DQN,” arXiv:1602.04621 [cs, stat], Feb. 2016.

Replay memory 개선

Prioritized Replay T. Schaul, J. Quan, I. Antonoglou, and D. Silver, “Prioritized Experience Replay,” arXiv:1511.05952 [cs], Nov. 2015.

Architecture

Dueling Network Z. Wang, N. de Freitas, and M. Lanctot, “Dueling network architectures for deep reinforcement learning,” arXiv preprint arXiv:1511.06581, 2015.

AI Platform

ViZDoom M. Kempka, M. Wydmuch, G. Runc, J. Toczek, and W. Jaśkowski, “ViZDoom: A Doom-based AI Research Platform for Visual Reinforcement Learning,” arXiv:1605.02097 [cs], May 2016.
OpenAIGymhttps://gym.openai.com/
Universehttps://universe.openai.com/
DeepMind Labhttps://deepmind.com/blog/open-sourcing-deepmind-lab/
Malmohttps://github.com/Microsoft/malmo

Application

AlphaGoD. Silver et al., “Mastering the game of Go with deep neural networks and tree search,” Nature, vol. 529, no. 7587, pp. 484–489, Jan. 2016.
Anold G. Lample and D. S. Chaplot, “Playing FPS Games with Deep Reinforcement Learning,” arXiv:1609.05521 [cs], Sep. 2016.

구현체