- ...selection.
-
The difference between instruction and evaluation can be clarified by contrasting two
types of function optimization algorithms. One type is used when information about
the gradient of the function being minimized (or maximized) is directly
available. The gradient instructs the algorithm as to how it should move in the
search space. The errors used by many supervised learning algorithms are gradients (or
approximate gradients). The other type of optimization algorithm uses only function
values, corresponding to evaluative information, and has to actively probe the
function at additional points in the search space in order to decide where to go next.
Classical examples of these types of algorithms are respectively the Robbins-Monro
and the Kiefer-Wolfowitz stochastic approximation algorithms (see, e.g., Kashyap,
Blaydon, and Fu, 1970).
- ...probability.
- This is actually a considerable simplification of these
learning automata algorithms. For example, they are defined as well for
n>2 and often use a different step-size parameter (70#70
) on success and on
failure. Nevertheless, the limitations identified in this section still apply.
Richard Sutton
Sat May 31 12:02:11 EDT 1997