내용으로 건너뛰기
Out of the Box
사용자 도구
로그인
사이트 도구
검색
도구
문서 보기
이전 판
Fold/unfold all
역링크
최근 바뀜
미디어 관리자
사이트맵
로그인
>
최근 바뀜
미디어 관리자
사이트맵
추적:
drl:start
이 문서는 읽기 전용입니다. 원본을 볼 수는 있지만 바꿀 수는 없습니다. 문제가 있다고 생각하면 관리자에게 문의하세요.
===== General Video Game Playing ===== ==== Learning Algorithm ==== * [[https://arxiv.org/abs/1704.04651|The Reactor: A Sample-Efficient Actor-Critic Architecture]] ==== Reward Shaping ==== * [[https://arxiv.org/abs/1611.05397|Reinforcement Learning with Unsupervised Auxiliary Tasks]] ==== Simulation ==== * [[https://arxiv.org/abs/1704.02254|RECURRENT ENVIRONMENT SIMULATORS]] * [[https://arxiv.org/abs/1507.08750|Action-Conditional Video Prediction using Deep Networks in Atari Games]] * ==== Learning by Instruction ==== * [[https://arxiv.org/abs/1704.05539|Beating Atari with Natural Language Guided Reinforcement Learning]] ==== ?? ===== * ?? * Z. C. Lipton, J. Gao, L. Li, X. Li, F. Ahmed, and L. Deng, “Efficient Exploration for Dialogue Policy Learning with BBQ Networks & Replay Buffer Spiking,” arXiv:1608.05081 [cs, stat], Aug. 2016. * Adaptive Normalization * H. van Hasselt et al., “Learning values across many orders of magnitude,” arXiv:1602.07714 [cs, stat], Feb. 2016. * [[deep learning:safe off policy|Safe Off-Policy]] *R. Munos et al., “Safe and Efficient Off-Policy Reinforcement Learning,” arXiv preprint arXiv:1606.02647, 2016. * Empowerment * S. Mohamed and D. J. Rezende, “Variational information maximisation for intrinsically motivated reinforcement learning,” Advances in Neural Information Processing Systems, pp. 2125–2133, 2015. * Universal Values * T. Schaul et al., “Universal value function approximators,” in Proceedings of the 32nd International Conference on Machine Learning (ICML-15), 2015, pp. 1312–1320. * Macro-Actions * A. Vezhnevets et al., “Strategic Attentive Writer for Learning Macro-Actions,” in Advances in Neural Information Processing Systems 29, D. D. Lee, M. Sugiyama, U. V. Luxburg, I. Guyon, and R. Garnett, Eds. Curran Associates, Inc., 2016, pp. 3486–3494. * Successor Features * A. Barreto et al., “Successor Features for Transfer in Reinforcement Learning,” arXiv preprint arXiv:1606.05312, 2016. * Progressive Network * A. A. Rusu et al., “Progressive neural networks,” arXiv preprint arXiv:1606.04671, 2016. * [[https://arxiv.org/pdf/1310.8499.pdf|2014-05, Deep AutoRegressive Networks]] ==== Deep Reinforcement Learning 알고리즘 ==== ^Deep Q-Learning |V. Mnih et al., “Human-level control through deep reinforcement learning,” Nature, vol. 518, no. 7540, pp. 529–533, Feb. 2015. | ^Double DQN |H. van Hasselt, A. Guez, and D. Silver, “Deep Reinforcement Learning with Double Q-learning,” arXiv:1509.06461 [cs], Sep. 2015. | ^DDPG|T. P. Lillicrap et al., “Continuous control with deep reinforcement learning,” arXiv:1509.02971 [cs, stat], Sep. 2015.| ^Async. Deep RL| V. Mnih et al., “Asynchronous Methods for Deep Reinforcement Learning,” arXiv:1602.01783 [cs], Feb. 2016.| ==== Exploration 개선 ==== ^Count & Exploration |I. Osband, C. Blundell, A. Pritzel, and B. Van Roy, “Deep Exploration via Bootstrapped DQN,” arXiv:1602.04621 [cs, stat], Feb. 2016. | ^ | | ==== Replay memory 개선 ==== ^Prioritized Replay |T. Schaul, J. Quan, I. Antonoglou, and D. Silver, “Prioritized Experience Replay,” arXiv:1511.05952 [cs], Nov. 2015.| ^ | | ==== Architecture ==== ^Dueling Network | Z. Wang, N. de Freitas, and M. Lanctot, “Dueling network architectures for deep reinforcement learning,” arXiv preprint arXiv:1511.06581, 2015.| ==== AI Platform ==== ^ViZDoom |M. Kempka, M. Wydmuch, G. Runc, J. Toczek, and W. Jaśkowski, “ViZDoom: A Doom-based AI Research Platform for Visual Reinforcement Learning,” arXiv:1605.02097 [cs], May 2016. | ^OpenAIGym|https://gym.openai.com/ | ^Universe|https://universe.openai.com/| ^DeepMind Lab|https://deepmind.com/blog/open-sourcing-deepmind-lab/| ^Malmo|https://github.com/Microsoft/malmo| ==== Application ==== ^AlphaGo|D. Silver et al., “Mastering the game of Go with deep neural networks and tree search,” Nature, vol. 529, no. 7587, pp. 484–489, Jan. 2016.| ^Anold |G. Lample and D. S. Chaplot, “Playing FPS Games with Deep Reinforcement Learning,” arXiv:1609.05521 [cs], Sep. 2016. | ===== 구현체 ===== * https://github.com/google/dopamine
drl/start.txt
· 마지막으로 수정됨: 2024/03/23 02:42 저자
127.0.0.1
문서 도구
문서 보기
이전 판
역링크
Fold/unfold all
맨 위로