Home Reinforcement Learning and Artificial Intelligence (RLAI)
CMPUT 499/609: Reinforcement Learning in Artificial Intelligence
Created by Rich Sutton Jan 9 2006
The ambition of this web page is to be the official, central site for information, software, and handouts for CMPUT 499/609 (a course at the University of Alberta in Spring 2006).  There are also pointers here to slides for the course and to related courses elsewhere.

If you are taking the course in any capacity, please subscribe to this web page by clicking the "subscribe" link at the bottom of the page.  Then you will be kept apprised of announcements related to the course.

You can add comments or questions to all web pages with an "Extend" link at the bottom of the page.  If you click the notify box you might get a timely answer.

For the time being, thought questions should be emailed to anna@cs.ualberta.ca. 

Written exercises should be submitted in paper to Brian Booth at the beginning of class on the day they are due.  Please submit these in duplicate - an original and a photocopy.  The original will be marked and returned to you.


Web viewable (slightly old):
Source files (powerpoint) as a tar archive (Jan 06)

Related Courses Elsewhere

The online exercise for Chapter 2 have been updated to hopefully save you some work. Also, when writing the pseudocode for question 2.5, please keep it simple. There should be no need to explain data structures or implementation details of that type. For examples of the level of detail expected for this question, refer to examples of pseudocode in the book, particularly Figure 4.1 (p.92) and Figure 5.1 (p.113).

To give you a sense of how your assignment will be graded, here's a likely mark distribution for this exercise:
   2.1:  2pts.
   2.5:  8pts.
  2.55: 28pts.
   2.8:  2pts. (extra credit)  

The description of the party problem is now online. Hopefully, it's described just the way you remember it. Also, here is the mark distribution for the Chapter 3 exercieses,

   Questions 3.5 & 3.11          - 4 pts. each
   Questions 3.9 & 3.15          - 5 pts. each
   Questions 3.4, 3.10, & 3.17   - 6 pts. each
   Question  3.8                 - 8 pts.
   Question  3.6 (Extra Credit)  - 2 pts.

Finally, handouts of the solutions to the exercises from chapters 2 & 3 will be made available at the next lecture for all those who turned in answers. Until then, keep fit and have fun.  

Hi all,

The description of the second programming assignment is now online (on the party problem page). If the description is unclear, let me know. Also, here is the mark distribution for the Chapter 4 written exercises:

   Exercise 4.1 - 6 pts.
   Exercise 4.2 - 7 pts.
   Exercise 4.3 - 9 pts.
   Exercise 4.5 - 8 pts.
   Exercise 4.9 - 4 pts.  

Exercise 4.9 refers to equation (4.10), which should be interpretted as having two parts (really two equations).  You should do both equations (2 pts for each).  

Beginning on Tuesday Feb 7, CMPUT 499/609 will meet in CSC-B43.  
cu there  

Due to the tendency of UofA students to party all night (depending on what you call a party) the due date for the DP/Party programming assignment is extended until thursday, February 9th.  

Your marks for the written & programming exercises are now online (see link on course page). The marks are posted by assigned 'Mark ID' since I don't have people's student IDs. You will find your Mark ID written on your first programming assignment when I return it to you in class today.  

Regarding your RL-Glue assignment:

I have updated the code on the website. This was to correct a couple things in the build script for linux users. PLEASE GET THE NEW VERSION

I encourage everyone to use the discussion and FAQ pages on /RLBB/top.html. This way you can extend the pages and ask me questions in a public way so that the whole class will benefit. Also, feel free to email me any questions or concerns regarding the "Glue".

Adam - awhite@cs.ualberta.ca

In case you missed the RL-Glue / RL-Library lecture, you can get my slides here: /RLL/classPres.pdf


The Blackjack programming assignment description is now available on the webpage. If it's not clear, please let me know. Also, the mark distribution for the Chapter 5 written exercises is:

   Exercise 5.1 - 3 pts.
   Exercise 5.2 - 4 pts.
   Exercise 5.5 - 4 pts.  

The Blackjack environment is now available on the RL-library environment page.


Note that I made a mistake with the mark distribution of the Chapter 5 written exercises. You should be answering exercise 5.1, 5.2, and 5.5, not exercise 5.3. Sorry about that.  

The MC programming assignment link does not seem to be working.  

The MC programming assignment link should be working now.  

According to the new requirement in the programming assignment that each card is limited in number, it seems that now a state is defined by the dealer's shown card, and the number of each cards in deck, rather than simply 3 variables: player's sum, useable ace, dealer's shown card.  If that is true then the state space has too many dimensions and impossible to  graph it like that in the text.  

The description of the modified blackjack was a little ambiguous on this point, but the state space remains the same - you do not need to keep track of which cards have been dealt because they are dealt with replacement.

I've re-worded the problem description so that this should now be crystal clear.  Please speak up again if it's not.  

The cat mouse environment is now available on the environment shelf of the RL-Library. Please feel free to point out any bugs or inefficiencies in the code as I converted it fairly quickly this morning.


I found a bug in the Blackjack code that may have been effecting the performance on some systems. Unpdated code can be found in the RL-Library on the environment shelf.  

Hi all,

The midterm has been marked and the marks are posted online. You will have a chance to view your midterm in today's class, but because of the amount of material that remains to be covered in the course, we won't be discussing it in detail. If you have any questions about the midterm, or how it was marked, please come see me outside of class time.

A couple of updates about future assignments:

(a) The Sarsa programming assignment (cat and mouse) has been cancelled. However, there will be a future Function Approximation assignment that will involve TD methods (it's description will be up soon-ish).

(b) The mark distribution for the written exercises of chapter 7 (due Tuesday) is as follows:
   Exercise 7.2 - 3 pts.
   Exercise 7.6 - 6 pts.  

Hey all,

First off, the first function approximation programming assignment is now online.

As for the marking of the backup diagrams on the midterm, let me first explain my reasoning. In the RL framework we've been looking at, we don't consider actions independently. It doesn't really make sense to ask how good is an action. Instead, we look at actions from a given state: state-action pairs. The same holds for the Bellman equations and update rules. It doesn't make sense to talk about the value of an action on its own. Backup diagrams are a way of visually representing these equations. The hope is that, with one, the other can be quickly determined. So backup diagrams should also refer to state-action pairs instead of actions.

Rich and I discussed this, and though he understood my reasoning, he thought I was being too harsh given that it was an exam environment. He also said I should lighten up and stop watching Sean Penn movies and Law & Order.

So in conclusion:
   (a) Rich is right.
   (b) I've made my point
   (c) Your midterm marks have been corrected  

Hey all,

The description of the second function approximation assignment is now available online (on the same page as the description of the first function approximation assignment). If the description is unclear, please let me know. The assignment is due March 28th.

Also, here's the mark distribution of the Chapter 8 written exercises:
   Exercise 8.1 - 3 pts.
   Exercise 8.2 - 4 pts.
   Exercise 8.6 - 3 pts.
   Exercise 8.7 - 2 pts.  

For those of you who chose to use the Quickgraph software for the function approximation assignment, note that there is a bug in the program. On line 300 of graph3d.py, it currently says:

   wzmings = window[2][1]

But it should really say:

   wzmaxgs = window[2][1]

since wzmings is set properly on the line before it and wzmaxgs is not being set. You may need to fix this yourself. Sorry for the inconvenience. 

Thanks to David Thue for finding this bug and the fix. 

For the mountain car assignment, don't try to do anything fancy in implementing eligibility traces.  Don't try to keep track of which traces are significantly non-zero and decay only them.  Just pick the memory size reasonably small so that it doesn't take forever to do the trace decay step.

This assignment is challenging to complete in a week.  In this course i don't usually get as far as a complete program for RL including function approximation and traces.  It is an important, non-trivial step.  Congratulations.  

Hi, here's a paper related to Curtis' comment on GPU matrix algorithms


Hey all,

Just a quick note to give you the distribution of points for the Chapter 9 written exercises (the last ones, yay):

   Exercise 9.1 - 4 pts.
   Exercise 9.2 - 3 pts.
   Exercise 9.3 - 3 pts.
   Exercise 9.5 - 6 pts.
   Exercise 9.6 - 2 pts. (Extra Credit)  

this is a heads up that there will be a short reading assignment with thought questions due on tuesday, a week from today, the same day the mini-project proposal is due.  see this space for the url to the reading.  

the reading for tuesday is http://www.cs.ualberta.ca/~bowling/papers/01ijcai.pdf  


Just a reminder to those in CMPUT 609 that if you haven't submitted your mini-project proposals to Rich or I yet, make sure you do so before the end of the day. Thanks  

Hey all,

The final exam marks are now online. If you wish to view/discuss your exam, you can come visit me in my office sometime before 5pm next Wednesday (April 26). Thanks,  

Extend this Page   How to edit   Style   Subscribe   Notify   Suggest   Help   This open web page hosted at the University of Alberta.   Terms of use  7176/1