Reinforcement Learning

The Problem with Policy Gradient

If you’ve read my article about the REINFORCE algorithm, you should be familiar with the update that’s typically used in policy gradient methods. $$\nabla_{\theta} J(\theta) = \mathbb{E}_{\tau \sim \pi_{\theta}(\tau)} \left[ \left( \sum_{t} \nabla_\theta \log{\pi_\theta}(a_t \mid s_t)\right) \left(\sum_t r(s_t, a_t)\right)\right]$$

Jun 3, 2019 8 min read Machine Learning, Reinforcement Learning

A Tutorial on the REINFORCE Algorithm

The setup for the general reinforcement learning problem is as follows. We’re given an environment $\mathcal{E}$ with a specified state space $\mathcal{S}$ and an action space $\mathcal{A}$ giving the allowable actions in each of those states.

Apr 18, 2018 5 min read Machine Learning, Reinforcement Learning