Footnotes

The difference between instruction and evaluation can be clarified by contrasting two types of function optimization algorithms. One type is used when information about the gradient of the function being minimized (or maximized) is directly available. The gradient instructs the algorithm as to how it should move in the search space. The errors used by many supervised learning algorithms are gradients (or approximate gradients). The other type of optimization algorithm uses only function values, corresponding to evaluative information, and has to actively probe the function at additional points in the search space in order to decide where to go next. Classical examples of these types of algorithms are respectively the Robbins-Monro and the Kiefer-Wolfowitz stochastic approximation algorithms (see, e.g., Kashyap, Blaydon, and Fu, 1970).

...probability.

This is actually a considerable simplification of these learning automata algorithms. For example, they are defined as well for n>2 and often use a different step-size parameter (70#70 ) on success and on failure. Nevertheless, the limitations identified in this section still apply.

Richard Sutton
Sat May 31 12:02:11 EDT 1997