Code for:

Reinforcement Learning: An Introduction, 2nd edition
by Richard S. Sutton and Andrew G. Barto

Below are links to a variety of software related to examples and exercises in the book.

And below is some of the code that Rich used to generate the examples and figures in the 2nd edition (made available as is):

Chapter 1: Introduction
- Tic-Tac-Toe Example (Lisp). In C.
Chapter 2: Multi-armed Bandits
Chapter 3: Finite Markov Decision Processes
- Pole-Balancing Example, Example 3.4 (C)
- Gridworld Example 3.5 and 3.8, Code for Figures 3.2 and 3.5 (Lisp)
Chapter 4: Dynamic Programming
Chapter 5: Monte Carlo Methods
Chapter 6: Temporal-Difference Learning
- TD Prediction in Random Walk, Example 6.2 (Lisp)
- TD Prediction in Random Walk with Batch Training, Example 6.3, Figure 6.2 (Lisp)
- TD Prediction in Random Walk (MatLab by Jim Stone)
- Double Q-learning vs conventional Q-learning Example 6.7, Figure 6.5 (Lisp)
Chapter 7: n-step Bootstrapping
- N-step TD on the Random Walk, Example 7.1, Figure 7.2: online and offline (Lisp). In C.
Chapter 8: Planning and Learning with Tabular Methods
- Trajectory Sampling Experiment, Figure 8.8 (Lisp)
Chapter 9: On-policy Prediction with Approximation
Chapter 10: On-policy Control with Approximation

Linear Semi-gradient Sarsa(lambda) on the Mountain-Car, Figure 10.1
n-step Sarsa on Mountain Car, Figures 10.2-4 (Lisp) with tile coding
R-learning on Access-Control Queuing Task, Example 10.2, Figure 10.5 (Lisp), (C version)

Chapter 11: Off-policy Methods with Approximation

Baird Counterexample Results, Figures 11.2, 11.5, and 11.6 (Lisp)

Chapter 12: Eligibility Traces

Offline lambda-return results, Figure 12.3 (Lisp)
TD(lambda) and true online TD(lambda) results, Figures 12.6 and 12.8 (Lisp)
Sarsa(lambda) on Mountain Car (Lisp) (Python: MC and Sarsa) with tile coding

Chapter 13: Policy Gradient Methods (this Python code is available at github)