drl:start
차이
문서의 선택한 두 판 사이의 차이를 보여줍니다.
양쪽 이전 판이전 판다음 판 | 이전 판 | ||
drl:start [2016/11/30 06:23] – 제거됨 rex8312 | drl:start [2024/03/23 02:42] (현재) – 바깥 편집 127.0.0.1 | ||
---|---|---|---|
줄 1: | 줄 1: | ||
+ | ===== General Video Game Playing ===== | ||
+ | |||
+ | ==== Learning Algorithm ==== | ||
+ | |||
+ | * [[https:// | ||
+ | |||
+ | ==== Reward Shaping ==== | ||
+ | |||
+ | * [[https:// | ||
+ | |||
+ | ==== Simulation ==== | ||
+ | |||
+ | * [[https:// | ||
+ | * [[https:// | ||
+ | * | ||
+ | |||
+ | ==== Learning by Instruction ==== | ||
+ | |||
+ | * [[https:// | ||
+ | ==== ?? ===== | ||
+ | |||
+ | * ?? | ||
+ | * Z. C. Lipton, J. Gao, L. Li, X. Li, F. Ahmed, and L. Deng, “Efficient Exploration for Dialogue Policy Learning with BBQ Networks & Replay Buffer Spiking,” arXiv: | ||
+ | * Adaptive Normalization | ||
+ | * H. van Hasselt et al., “Learning values across many orders of magnitude, | ||
+ | * [[deep learning: | ||
+ | *R. Munos et al., “Safe and Efficient Off-Policy Reinforcement Learning, | ||
+ | * Empowerment | ||
+ | * S. Mohamed and D. J. Rezende, “Variational information maximisation for intrinsically motivated reinforcement learning, | ||
+ | * Universal Values | ||
+ | * T. Schaul et al., “Universal value function approximators, | ||
+ | * Macro-Actions | ||
+ | * A. Vezhnevets et al., “Strategic Attentive Writer for Learning Macro-Actions, | ||
+ | * Successor Features | ||
+ | * A. Barreto et al., “Successor Features for Transfer in Reinforcement Learning, | ||
+ | * Progressive Network | ||
+ | * A. A. Rusu et al., “Progressive neural networks, | ||
+ | * [[https:// | ||
+ | |||
+ | |||
+ | ==== Deep Reinforcement Learning 알고리즘 ==== | ||
+ | |||
+ | ^Deep Q-Learning |V. Mnih et al., “Human-level control through deep reinforcement learning, | ||
+ | ^Double DQN |H. van Hasselt, A. Guez, and D. Silver, “Deep Reinforcement Learning with Double Q-learning, | ||
+ | ^DDPG|T. P. Lillicrap et al., “Continuous control with deep reinforcement learning, | ||
+ | ^Async. Deep RL| V. Mnih et al., “Asynchronous Methods for Deep Reinforcement Learning, | ||
+ | |||
+ | ==== Exploration 개선 ==== | ||
+ | |||
+ | ^Count & Exploration |I. Osband, C. Blundell, A. Pritzel, and B. Van Roy, “Deep Exploration via Bootstrapped DQN,” arXiv: | ||
+ | ^ | | | ||
+ | |||
+ | ==== Replay memory 개선 ==== | ||
+ | |||
+ | ^Prioritized Replay |T. Schaul, J. Quan, I. Antonoglou, and D. Silver, “Prioritized Experience Replay,” arXiv: | ||
+ | ^ | | | ||
+ | |||
+ | ==== Architecture ==== | ||
+ | |||
+ | ^Dueling Network | Z. Wang, N. de Freitas, and M. Lanctot, “Dueling network architectures for deep reinforcement learning, | ||
+ | |||
+ | |||
+ | |||
+ | |||
+ | ==== AI Platform ==== | ||
+ | |||
+ | ^ViZDoom |M. Kempka, M. Wydmuch, G. Runc, J. Toczek, and W. Jaśkowski, “ViZDoom: A Doom-based AI Research Platform for Visual Reinforcement Learning, | ||
+ | ^OpenAIGym|https:// | ||
+ | ^Universe|https:// | ||
+ | ^DeepMind Lab|https:// | ||
+ | ^Malmo|https:// | ||
+ | |||
+ | ==== Application ==== | ||
+ | |||
+ | ^AlphaGo|D. Silver et al., “Mastering the game of Go with deep neural networks and tree search,” Nature, vol. 529, no. 7587, pp. 484–489, Jan. 2016.| | ||
+ | ^Anold |G. Lample and D. S. Chaplot, “Playing FPS Games with Deep Reinforcement Learning, | ||
+ | |||
+ | ===== 구현체 ===== | ||
+ | |||
+ | * https:// | ||
+ | | ||
+ | |||
+ | |||