사용자 도구

사이트 도구


drl:start

문서의 이전 판입니다!


General Video Game Playing 관련연구 목록

  • Adaptive Normalization
    • H. van Hasselt et al., “Learning values across many orders of magnitude,” arXiv:1602.07714 [cs, stat], Feb. 2016.
    • R. Munos et al., “Safe and Efficient Off-Policy Reinforcement Learning,” arXiv preprint arXiv:1606.02647, 2016.
  • Empowerment
    • S. Mohamed and D. J. Rezende, “Variational information maximisation for intrinsically motivated reinforcement learning,” Advances in Neural Information Processing Systems, pp. 2125–2133, 2015.
  • Universal Values
    • T. Schaul et al., “Universal value function approximators,” in Proceedings of the 32nd International Conference on Machine Learning (ICML-15), 2015, pp. 1312–1320.
  • Macro-Actions
    • A. Vezhnevets et al., “Strategic Attentive Writer for Learning Macro-Actions,” in Advances in Neural Information Processing Systems 29, D. D. Lee, M. Sugiyama, U. V. Luxburg, I. Guyon, and R. Garnett, Eds. Curran Associates, Inc., 2016, pp. 3486–3494.
  • Successor Features
    • A. Barreto et al., “Successor Features for Transfer in Reinforcement Learning,” arXiv preprint arXiv:1606.05312, 2016.
  • Progressive Network
    • A. A. Rusu et al., “Progressive neural networks,” arXiv preprint arXiv:1606.04671, 2016.

Deep Reinforcement Learning 알고리즘

Deep Q-Learning V. Mnih et al., “Human-level control through deep reinforcement learning,” Nature, vol. 518, no. 7540, pp. 529–533, Feb. 2015.
Double DQN H. van Hasselt, A. Guez, and D. Silver, “Deep Reinforcement Learning with Double Q-learning,” arXiv:1509.06461 [cs], Sep. 2015.
DDPGT. P. Lillicrap et al., “Continuous control with deep reinforcement learning,” arXiv:1509.02971 [cs, stat], Sep. 2015.
Async. Deep RL V. Mnih et al., “Asynchronous Methods for Deep Reinforcement Learning,” arXiv:1602.01783 [cs], Feb. 2016.

Exploration 개선

Count & Exploration I. Osband, C. Blundell, A. Pritzel, and B. Van Roy, “Deep Exploration via Bootstrapped DQN,” arXiv:1602.04621 [cs, stat], Feb. 2016.

Replay memory 개선

Prioritized Replay T. Schaul, J. Quan, I. Antonoglou, and D. Silver, “Prioritized Experience Replay,” arXiv:1511.05952 [cs], Nov. 2015.

Architecture

Dueling Network Z. Wang, N. de Freitas, and M. Lanctot, “Dueling network architectures for deep reinforcement learning,” arXiv preprint arXiv:1511.06581, 2015.

Memory

Neural Turing Machine A. Graves, G. Wayne, and I. Danihelka, “Neural Turing Machines,” arXiv:1410.5401 [cs], Oct. 2014.
External Memory A. Graves et al., “Hybrid computing using a neural network with dynamic external memory,” Nature, vol. 538, no. 7626, pp. 471–476, Oct. 2016.

Reward Shaping

UNREAL [1]M. Jaderberg et al., “Reinforcement Learning with Unsupervised Auxiliary Tasks,” arXiv:1611.05397 [cs], Nov. 2016.

AI Platform

ViZDoom M. Kempka, M. Wydmuch, G. Runc, J. Toczek, and W. Jaśkowski, “ViZDoom: A Doom-based AI Research Platform for Visual Reinforcement Learning,” arXiv:1605.02097 [cs], May 2016.

Application

AlphaGoD. Silver et al., “Mastering the game of Go with deep neural networks and tree search,” Nature, vol. 529, no. 7587, pp. 484–489, Jan. 2016.
Anold G. Lample and D. S. Chaplot, “Playing FPS Games with Deep Reinforcement Learning,” arXiv:1609.05521 [cs], Sep. 2016.
drl/start.1480488720.txt.gz · 마지막으로 수정됨: 2024/03/23 02:38 (바깥 편집)