Next:
2 Evaluative Feedback
2 Evaluative Feedback
2.1 An n-armed Bandit Problem
2.2 Action-Value Methods
2.3 Softmax Action Selection
2.4 Evaluation versus Instruction
2.5 Incremental Implementation
2.6 Tracking a Nonstationary Problem
2.7 Optimistic Initial Values
2.8 Reinforcement Comparison
2.9 Pursuit Methods
2.10 Associative Search
2.11 Conclusion
2.12 Bibliographical and Historical Remarks
About this document ...
Richard Sutton
Sat May 31 12:02:11 EDT 1997