Code for:

Reinforcement Learning: An Introduction
by Richard S. Sutton and Andrew G. Barto

Below are links to a variety of software related to examples and exercises in the book, organized by chapters (some files appear in multiple places). See particularly the Mountain Car code. Most of the rest of the code is written in Common Lisp and requires utility routines available here. For the graphics, you will need the the packages for G and in some cases my graphing tool. Even if you can not run this code, it still may clarify some of the details of the experiments. However, there is no guarantee that the examples in the book were run using exactly the software given. This code also has not been extensively tested or documented and is being made available "as is". If you have corrections, extensions, additions or improvements of any kind, please send them to me at rich@richsutton.com for inclusion here.

Matlab code for nearly all the examples and excercises in the book has been contributed by John Weatherwax. Thanks John!

Chapter 1: Introduction
- Tic-Tac-Toe Example (Lisp). In C.
Chapter 2: Evaluative Feedback
- 10-armed Testbed Example, Figure 2.1 (Lisp)
- Testbed with Softmax Action Selection, Exercise 2.2 (Lisp)
- Bandits A and B, Figure 2.3 (Lisp)
- Testbed with Constant Alpha, cf. Exercise 2.7 (Lisp)
- Optimistic Initial Values Example, Figure 2.4 (Lisp)
- Code Pertaining to Reinforcement Comparison: File1, File2, File3 (Lisp)
- Pursuit Methods Example, Figure 2.6 (Lisp)
Chapter 3: The Reinforcement Learning Problem
- Pole-Balancing Example, Figure 3.2 (C)
- Gridworld Example 3.8, Code for Figures 3.5 and 3.8 (Lisp)
Chapter 4: Dynamic Programming
Chapter 5: Monte Carlo Methods
- Monte Carlo Policy Evaluation, Blackjack Example 5.1, Figure 5.2 (Lisp)
- Monte Carlo ES, Blackjack Example 5.3, Figure 5.5 (Lisp)
Chapter 6: Temporal-Difference Learning
- TD Prediction in Random Walk, Example 6.2, Figures 6.5 and 6.6 (Lisp)
- TD Prediction in Random Walk with Batch Training, Example 6.3, Figure 6.8 (Lisp)
- TD Prediction in Random Walk (MatLab by Jim Stone)
- R-learning on Access-Control Queuing Task, Example 6.7, Figure 6.17 (Lisp), (C version)
Chapter 7: Eligibility Traces
- N-step TD on the Random Walk, Example 7.1, Figure 7.2: online and offline (Lisp). In C.
- lambda-return Algorithm on the Random Walk, Example 7.2, Figure 7.6 (Lisp)
- Online TD(lambda) on the Random Walk, Example 7.3, Figure 7.9 (Lisp)
Chapter 8: Generalization and Function Approximation
Chapter 9: Planning and Learning
- Trajectory Sampling Experiment, Figure 9.14 (Lisp)
Chapter 10: Dimensions of Reinforcement Learning
Chapter 11: Case Studies
- Acrobot (Lisp, environment only)
- Java Demo of RL Dynamic Channel Assignment

For other RL software see the Reinforcement Learning Repository at Michigan State University and here.

Code for:

Reinforcement Learning: An Introduction by Richard S. Sutton and Andrew G. Barto

Reinforcement Learning: An Introduction
by Richard S. Sutton and Andrew G. Barto