- ...journey.
- If this were a control problem with the objective of minimizing
travel time, then we would of course make the rewards the negative
of the elapsed time. But since we are concerned here only with prediction (policy
evaluation), we can keep things simple by using positive numbers.
Richard Sutton
Fri May 30 13:53:05 EDT 1997