In this part of the book we describe three fundamental classes of methods for solving the reinforcement learning problem: dynamic programming, Monte Carlo methods, and temporal-difference learning. All of these methods solve the full version of the problem including delayed rewards.

Each of the three classes of methods has its strengths and weaknesses. Dynamic programming methods are very well developed mathematically, but require a complete and accurate model of the environment. Monte Carlo methods don't require a model and are very simple conceptually, but are not suited for step-by-step incremental computation. Finally, temporal-difference methods require no model and are fully incremental, but are more complex to analyze. The methods also differ in several ways with respect to their efficiency and speed of convergence. In the third part of this book we explore how these methods can be combined so as to obtain the best features of each of them.

Fri May 30 21:13:22 EDT 1997