Practical Reinforcement Learning Archives

Certificate Practical Reinforcement Learning

My #61 course certificate (with Honors) from Coursera

Posted on July 15, 2021October 27, 2022 by keslerzhu

Practical Reinforcement LearningHigher School of Economics I am very proud that I survived and completed this thorny but exciting journey with Honors. I spent nearly 1 month on this very challenging course, which covers enormous amount of knowledge. The lecture videos are all well-prepared, they not only help me consolidate my original understanding, but also…

Regret of Policy, Boltzmann strategy, Hoeffding inequity

Exploration and Planning in Reinforcement Learning

Posted on July 13, 2021November 10, 2022 by keslerzhu

Exploration is needed to find unknown actions which lead to very large rewards. Most of the reinforcement learning algorithms share one problem: they learn by trying different actions and seeing which works better. We can use a few made-up heuristics (e.g. epsilon-greedy exploration) to mitigate the problem and speed up the learning process. Multi-armed bandits…

Reinforcement Learning: Policy Gradient Methods

Posted on July 8, 2021November 10, 2022 by keslerzhu

The problems of value-based methods The idea behind value-based reinforcement learning (say, Q-learning) is to find an optimal action, in a state, based on how much discounted reward you will get, by following a policy. The first problem here is value-based methods do not explicit learn “what to do”, instead it learns “what kind of…

Deep Q-Network in Reinforcement Learning

Posted on June 9, 2021November 10, 2022 by keslerzhu

Deep Q-Network (DQN) is the first successful application of learning, both directly from raw visual inputs as humans do and in a wide variety of environments. It contains deep convolutional network without hand-designed features. DQN is actually no more than standard Q-learning bundled with stability and epsilon-greedy exploration. In any application, the first thing you…

Function approximation in Reinforcement Learning

Supervised Learning in Reinforcement Learning

Posted on June 2, 2021November 10, 2022 by keslerzhu

Deduction to supervised learning problem In tabular method, each Q(s, a) could be seen as a parameter. There are more parameters than states, because there are as many parameters for each state as the possible number of actions for each state. There are also situations where states include continuous components. It means we need a…

Model-free Reinforcement Learning

Posted on May 29, 2021November 7, 2022 by keslerzhu

Value Iteration in real world n real world, we don’t have the state transition probability distribution or the reward function. You may try sampling them, but you will never know the exact probabilities of them. As the result, you can not compute the expectation of the action values. We want a new algorithm that would…

Dynamic programming, issues, discounting.

Dynamic Programming in RL

Posted on May 20, 2021November 10, 2022 by keslerzhu

Reward That all of what we mean by goals and purposes can be well thought of as maximization of the expected value of the cumulative sum of a received scalar signal (reward). R. Sutton This signal is ‘reward’, and sum of the signals is ‘return’. Each immediate reward depends on the agent action and environment…