Dynamic Programming in RL

Tweet Reward That all of what we mean by goals and purposes can be well thought of as maximization of the expected value of the cumulative sum of a received scalar signal (reward). R. Sutton This signal is ‘reward’, and sum of the signals is ‘return’. Each immediate reward depends on the agent action and … Continue reading Dynamic Programming in RL