Exploration and Planning in Reinforcement Learning

Regret of Policy, Boltzmann strategy, Hoeffding inequity

Exploration is needed to find unknown actions which lead to very large rewards. Most of the reinforcement learning algorithms share one problem: they learn by trying different actions and seeing which works better. We can use a few made-up heuristics (e.g. epsilon-greedy exploration) to mitigate the problem and speed up the learning process. Multi-armed bandits … Continue reading Exploration and Planning in Reinforcement Learning