So far we have discussed three classes of methods for solving the reinforcement learning problem: dynamic programming, Monte Carlo methods, and temporal-difference learning. Although each is different, these are not really alternatives in the sense that one must pick one or another. It is perfectly sensible and often desirable to apply methods of several different kinds simultaneously, that is, to apply a joint method with parts or aspects of more than one kind. For different tasks or different parts of one task one may want to emphasize one kind of method over another, but these choices can be made smoothly and at the time the methods are used, rather than the time at which they are designed. In Part III we present a unified view of the three kinds of elementary solution methods introduced in Part II.
The unifications we present in this part of the book are not rough analogies. We develop specific algorithms that embody the key ideas of one or more of the elementary solution methods. First we present the mechanism of eligibility traces, unifying Monte Carlo and temporal-difference methods. Then we bring in function approximation, enabling generalization across states and actions. Finally we reintroduce models of the environment to obtain the strengths of dynamic programming and heuristic search. All of these can be used synergistically as parts of joint methods.