Next:
6 Temporal Difference Learning
6 Temporal Difference Learning
6.1 TD Prediction
6.2 Advantages of TD Prediction Methods
6.3 Optimality of TD(0)
6.4 Sarsa: On-Policy TD Control
6.5 Q-learning: Off-Policy TD Control
6.6 Actor-Critic Methods (*)
6.7 R-Learning for Undiscounted Continual Tasks (*)
6.8 Games, After States, and other Special Cases
6.9 Conclusions
6.10 Historical and Bibliographical Remarks
About this document ...
Richard Sutton
Fri May 30 13:53:05 EDT 1997