At this point you may be wondering why we bother with bootstrapping methods at
all. Nonbootstrapping methods can be used with function approximation more
reliably and over a broader range of conditions than bootstrapping methods.
Nonbootstrapping methods achieve a lower asymptotic error than
bootstrapping methods, even when backups are done according to the on-policy
distribution. By using eligibility traces and , it is even possible to
implement nonbootstrapping methods on-line, in a step-by-step incremental
manner. Despite all this, in practice bootstrapping methods are usually the
methods of choice.
In empirical comparisons, bootstrapping methods usually perform much better
than nonbootstrapping methods. A convenient way to make such comparisons is to
use a TD method with eligibility traces and vary from 0 (pure bootstrapping)
to 1 (pure nonbootstrapping). Figure
8.15 summarizes a
collection of such results. In all cases, performance became much worse as
approached
, the nonbootstrapping case. The example in the upper right of the
figure is particularly significant in this regard. This is a policy evaluation
(prediction) task and the performance measure used is root MSE (at the end of each
episode, averaged over the first 20 episodes). Asymptotically, the
case must be best according to this measure, but here, short of the
asymptote, we see it performing much worse.
At this time it is unclear why methods that involve some bootstrapping perform so much better than pure nonbootstrapping methods. It could be that bootstrapping methods learn faster, or it could be that they actually learn something better than nonbootstrapping methods. The available results indicate that nonbootstrapping methods are better than bootstrapping methods at reducing MSE from the true value function, but reducing MSE is not necessarily the most important goal. For example, if you add 1000 to the true action-value function at all state-action pairs, then it will have very poor MSE, but you will still get the optimal policy. Nothing quite that simple is going on with bootstrapping methods, but they do seem to do something right. We expect the understanding of these issues to improve as research continues.