- ...algorithms.
- The iterative policy evaluation algorithm
is an example of a classical successive approximation algorithm for solving a
system of linear equations (e.g., Varga, 1962). The version of
the algorithm that uses two arrays, one holding the old values while the other is
updated, is often called a Jacobi-style algorithm, after Jacobi's
classical use of this method. It is also sometimes called a synchronous
algorithm because it can be performed in parallel, with separate processors
simultaneously updating the values of individual states using input from other
processors. The second array is needed to sequentially simulate this parallel
computation. The in-place version of the algorithm is often called a
Gauss-Seidel-style algorithm after the classical Gauss-Seidel algorithm for
solving systems of linear equations. In addition to iterative policy evaluation,
other DP algorithms can be implemented in these different versions. Bertsekas and
Tsitsiklis (1989) provide excellent coverage of these
variations and their performance differences.
Richard Sutton
Sat May 31 11:09:21 EDT 1997