======= Exploration ====== * https://medium.com/emergent-future/simple-reinforcement-learning-with-tensorflow-part-7-action-selection-strategies-for-exploration-d3a97b7cceaf * $e$-greedy e = 0.1 if np.random.rand(1) < e: action = random.randint(0, n_actions-1) else: q_val = model(state) action = np.argmax(q_val) * Boltzmann sampling t = 0.5 q_val = model(state) probs = F.softmax(q_val / t, dim=1) # torch action = prob.multinomial(num_samples=1).data # numpy action = np.random.choice(n_actions, p=probs) * Bayesian t = 0.5 q_val = model(state) probs = F.softmax(q_val / t, dim=1) # torch action = prob.multinomial(num_samples=1).data action_log_prob = log_prob.gather(1, torch.LongTensor([[action]]).to(device)) # numpy action = np.random.choice(n_actions, p=probs) action_log_prob = np.take(log_prob, [1], axis=1) # action_log_prob = log_prob[0][action] * UCB (upper confidence bound) * * thompson sampling * https://brunch.co.kr/@chris-song/66 ===== Survey ===== * [[https://lilianweng.github.io/lil-log/2020/06/07/exploration-strategies-in-deep-reinforcement-learning.html|Exploration Strategies in Deep Reinforcement Learning, 2020-06]]