Reinforcement Learning: Policy Gradient Methods

The problems of value-based methods The idea behind value-based reinforcement learning (say, Q-learning) is to find an optimal action, in a state, based on how much discounted reward you will get, by following a policy. The first problem here is value-based methods do not explicit learn “what to do”, instead it learns “what kind of … Continue reading Reinforcement Learning: Policy Gradient Methods