Reinforcement learning systems must be capable of *generalization*
if they are to be applicable to artificial intelligence or to large
engineering applications. To achieve this, any of a broad range of
existing methods for *supervised-learning function approximation* can be
used simply by treating each backup as a training example. *Gradient-descent methods*, in particular, allow a natural extension to
function approximation of all the techniques developed in previous chapters,
including eligibility traces. *Linear* gradient-descent methods are
particularly appealing theoretically and work well in practice when provided
with appropriate features. Choosing the features is one of the most important
ways of adding prior domain knowledge to reinforcement learning systems.
Linear methods include radial basis functions, tile coding, and
Kanerva coding. Backpropagation methods for multilayer neural networks are
methods for *nonlinear* gradient-descent function approximation.

For the most part, the extension of reinforcement learning prediction and control methods to gradient-descent forms is straightforward. However, there is an interesting interaction between function approximation, bootstrapping, and the on-policy/off-policy distinction. Bootstrapping methods, such as DP and TD() for , work reliably in conjunction with function approximation over a narrower range of conditions than do nonbootstrapping methods. Because the control case has not yet yielded to theoretical analysis, research has focused on the value prediction problem. In this case, on-policy bootstrapping methods converge reliably with linear gradient-descent function approximation to a solution with mean-squared error bounded by times the minimum possible error. Off-policy bootstrapping methods, on the other hand, may diverge to infinite error. Several approaches have been explored to making off-policy bootstrapping methods work with function approximation, but this is still an open research issue. Bootstrapping methods are of persistent interest in reinforcement learning, despite their limited theoretical guarantees, because in practice they usually work significantly better than nonbootstrapping methods.