Code for:
Below are links to a variety of software related to examples
and exercises in the book.
And below is some of the code that Rich used to generate the
examples and figures in the 2nd edition (made available as is):
- Chapter 1: Introduction
- Chapter 2: Multi-armed Bandits
- Chapter 3: Finite Markov Decision Processes
- Chapter 4: Dynamic Programming
- Chapter 5: Monte Carlo Methods
- Chapter 6: Temporal-Difference Learning
- Chapter 7: n-step Bootstrapping
- N-step TD on the Random Walk, Example 7.1, Figure 7.2: online and offline
(Lisp). In C.
- Chapter 8: Planning and Learning with Tabular Methods
- Chapter 9: On-policy Prediction with Approximation
- Chapter 10: On-policy Control with Approximation
- Chapter 11: Off-policy Methods with Approximation
- Baird Counterexample Results, Figures 11.2, 11.5, and 11.6 (Lisp)
- Chapter 12: Eligibility Traces
- Offline lambda-return results, Figure 12.3 (Lisp)
- TD(lambda) and true online TD(lambda) results, Figures 12.6
and 12.8 (Lisp)
- Sarsa(lambda) on Mountain Car (Lisp) (Python: MC and Sarsa) with tile coding
- Chapter 13: Policy Gradient Methods (this Python code is
available at github)