Next: About this document Up: Contents Previous: 6 Temporal-Difference Learning

Part III: A Unified View

So far we have discussed three classes of methods for solving the reinforcement learning problem: dynamic programming, Monte Carlo methods, and temporal-difference learning. Although each of these are different, they are not really alternatives in the sense that one must pick one or the other. It is perfectly sensible and often desirable to apply several of the different kinds of methods at once, that is to apply a joint method with parts or aspects of more than one kind. For different tasks or different parts of one task one may want to emphasize one kind of method over another, but these choices can be made smoothly and at the time the methods are used rather that the time at which they are designed. In Part III of the book we present a unified view of the three kinds of elementary solution methods introduced in Part II.

The unifications we present in this part of the book are not rough analogies. We develop specific algorithms that embody the key ideas of one or more of the elementary solution methods. First we present the mechanism of eligibility traces, unifying Monte Carlo and temporal-difference methods. Then we bring in function approximation, enabling generalization across states and actions. Finally we reintroduce models of the environment to obtain the strengths of dynamic programming and heuristic search. All of these can be used synergistically as parts of joint methods.

Richard Sutton
Fri May 30 21:18:34 EDT 1997