Home Reinforcement Learning and Artificial Intelligence (RLAI)
CMPUT 499/609: Reinforcement Learning in Artificial Intelligence
Created by Rich Sutton Jan 9 2006
The ambition of this web page is to be the official, central site for information, software, and handouts for CMPUT 499/609 (a course at the University of Alberta in Winter 2007).  There are also pointers here to slides for the course and to related courses elsewhere.  New stuff is in red.

If you are taking the course in any capacity, please subscribe to this web page by clicking the "subscribe" link at the bottom of the page.  Then you will be kept apprised of announcements related to the course.

You can add comments or questions to all web pages with an "Extend" link at the bottom of the page.  If you click the notify box you might get a timely answer.

Thought questions should be submitted in hardcopy in class on the day they are due.  Send them by email to sutton@cs.ualberta.ca only if hardcopy submission on that day is not possible for you. 

Written exercises should be submitted in paper at the beginning of class on the day they are due.

See the course schedule for the list of exercises to be completed for each chapter and generally for the assignment due at each class.  If there is a due date in an programming exercise that differs from that given in the schedule, believe the schedule.

Pretty soon, if you are at all mathematically inclined, you will want to read the Ross chapter below.

Here is the presentation schedule, including practice talks:

Slides

Web viewable (somewhat old):
Source files (powerpoint) as a tar archive (Jan 06)

Some Related Courses Elsewhere



When writing the pseudocode for question 2.5, please keep it simple. There should be no need to explain data structures or implementation details of that type. For examples of the level of detail expected for this question, refer to examples of pseudocode in the book, particularly Figure 4.1 (p.92) and Figure 5.1 (p.113).

To give you a sense of how your assignment will be graded, here's a likely mark distribution for this exercise:
   2.1:  2pts.
   2.5:  8pts.
  2.55: 28pts.
   2.8:  2pts. (extra credit)  

Here is the mark distribution for the Chapter 3 exercieses,

   Questions 3.5 & 3.11          - 4 pts. each
   Questions 3.9 & 3.15          - 5 pts. each
   Questions 3.4, 3.10, & 3.17   - 6 pts. each
   Question  3.8                 - 8 pts.
   Question  3.6 (Extra Credit)  - 2 pts.

 

Here is the mark distribution for the Chapter 4 written exercises:

   Exercise 4.1 - 6 pts.
   Exercise 4.2 - 7 pts.
   Exercise 4.3 - 9 pts.
   Exercise 4.5 - 8 pts.
   Exercise 4.9 - 4 pts.  

Exercise 4.9 refers to equation (4.10), which should be interpretted as having two parts (really two equations).  You should do both equations (2 pts for each).  

Regarding your RL-Glue assignment:

I encourage everyone to use the discussion and FAQ pages on /RLBB/top.html. This way you can extend the pages and ask me questions in a public way so that the whole class will benefit. Also, feel free to email me any questions or concerns regarding the "Glue".

Cheers,
Adam - awhite@cs.ualberta.ca

RL-Glue / RL-Library lecture slides: /RLL/classPres.pdf

Cheers,
Adam  

The mark distribution for the Chapter 5 written exercises is:

   Exercise 5.1 - 3 pts.
   Exercise 5.2 - 4 pts.
   Exercise 5.5 - 4 pts.  

The mark distribution for the written exercises of chapter 7 is as follows:
   Exercise 7.2 - 3 pts.
   Exercise 7.6 - 6 pts.  

Here's the mark distribution of the Chapter 8 written exercises:
   Exercise 8.1 - 3 pts.
   Exercise 8.2 - 4 pts.
   Exercise 8.6 - 3 pts.
   Exercise 8.7 - 2 pts.  
For those of you who chose to use the Quickgraph software for the function approximation assignment, note that there is a bug in the program. On line 300 of graph3d.py, it currently says:

   wzmings = window[2][1]

But it should really say:

   wzmaxgs = window[2][1]

since wzmings is set properly on the line before it and wzmaxgs is not being set. You may need to fix this yourself. Sorry for the inconvenience. 

Thanks to David Thue for finding this bug and the fix. 

For the mountain car assignment, don't try to do anything fancy in implementing eligibility traces.  Don't try to keep track of which traces are significantly non-zero and decay only them.  Just pick the memory size reasonably small so that it doesn't take forever to do the trace decay step.

This assignment is challenging to complete in a week.  In this course i don't usually get as far as a complete program for RL including function approximation and traces.  It is an important, non-trivial step.  Congratulations.  

The distribution of points for the Chapter 9 written exercises:

   Exercise 9.1 - 4 pts.
   Exercise 9.2 - 3 pts.
   Exercise 9.3 - 3 pts.
   Exercise 9.5 - 6 pts.
   Exercise 9.6 - 2 pts. (Extra Credit)  


For those of us who don't have a hard copy of the text book yet: the exercises for chapter 2 have the same numbers in the book as in the online version. There is one small difference: online, in ex. 2.5, it should be alpha = (1/k) instead of (alpha = 1) / k.  

For the thought questions:

Note that the thought questions should be sent to sutton@cs.ualberta.ca, not anna@cs.ualberta.ca as it used to say on this page.

To help me find your thought questions in my inbox, please put "[thought]" in your message title.
s
-RS  

Regarding the textbook:

Apparently, and contrary to what i said in class, there are no copies of the textbook in the bookstore.

Apparently, there is a pdf scanned copy of the book on the internet available to UofA students.  Varun will send the URL.  

You can access the online version of the textbook through the www.library.ualberta.ca website. Search for "reinforcement learning sutton", and it should be the first link.  Follow the "UA Internet Access" link.  

[there is now a direct link in the menu above]

Lots of changes to the course web page and to the assignments for thursday.  (aren't you glad you subscribed?)
-rs  

if you have not already done so, i would appreciate it if you officially registered for the course by the end of tomorrow, which i think is the last day you can.  if you are a student in a department other than CS, then to do this you must see edith drummond in the CS dept office on the 2nd floor of athabasca hall before 4pm.  
thanks,
rich  

The assignments due on thursday, january 18, have been changed, partly due to our being a little bit behind, but more so that we can spend more time on chapter 3, which is a little long and critical to the course. See the schedule for the details, but basically chapter 2 exercises are pushed to next tuesday and a followup to the jeopardy quiz (see highlighted menu item below) has been added for this thursday in its place.  If you don't see this note in time and do these in the opposite (more original) order that is fine.

Other events in the schedule are pushed back accordingly.

Note that thought questions are due on thursday for chapters 1 and 2 if you have not sent them to me before.  Although we are switching to hardcopy submission for the future, you do not have to resubmit in hardcopy whal you have already submitted in email.

Please note that exercise 2.55 can be found in the menu below.


it is important to do well on the chapter 3 exercises.  to make sure that everybody can do this, i'd like to have another class on chapter 3 before requiring the exercises to be completed.  so, the chapter 3 exercises will be due on thursday, one class later than previously scheduled.  i would still like to try to have the first programming assignment due on that day as well, but we will have to see how it goes.

-rich  

On page 61, the upper limit in Equation 3.3 should be T-t-1.  

Please note that only the first party-problem programming assignment is due on tuesday the 6th. The second part is due a week later.  

Thanks to Andrew for pointing out that this was not indicated correctly on the web site.  It has now been corrected.  The course schedule is always your most reliable guide.

rich  

I have reworked the schedule so that the assignment for chapter is due on tuesday the 13th, the next programming assignment due the 15th, and so on.

To complete our discussion of state, and make sure you got it, here is a little puzzler, a micro-assignment for next class:

State Assignment:

Consider a world of a single grid cell one side of which is open (if you face this side the sensation is 0; otherwise it is 1). There are three actions: turn right, turn left, and spin. Spin changes your orientation to a random direction, 25% to each of the four.

How many states are there (in the Monad sense) and what are they?

Turn your answers in on paper for thursday's class.

dear class,

only two people got the state mini-assignment correct: vlad and elliot (sorry andrew b., your number was correct but not your explanation).  so i'd like everybody else to try it again. here's a hint: the correct answer is greater than 5.

turn it in on tuesday, with the chapter 4 exercises.

rich  

Hi all,

For those of you that wish to write python agents and/or environments with RL-Glue, I have added 3 new projects to RL-Library. They illustrate using:

A Python agent with a C environment
A Python environment with a C agent
A Python agent with a Python environment

See the Project Shelf of RL-Library (/RLR/project.html).

Also, please note RL-Glue cannot be downloaded from RL-Library. The library contains agents, environments, experiments and Projects only! Download RL-Glue from Sourceforge (see: http://sourceforge.net/projects/rl-glue)

Cheers,  

In the book the player hits automatically until the player has a sum of 12 or greater.  Below is a modified environment so you only have to learn a policy from a sum of 12 or greater, following the description in the book.

http://www.cs.ualberta.ca/~butcher/Blackjack.cpp  

in class we thought april 10 was the last day of class, but actually it is the 12th. this affects the schedule for the students presentations, which will now be on the 10th and 12th, in the same order as previously planned.  -rich

i've made some changes to the schedule.  we will do more FA on thursday and i am moving the rest later by one class.  i'll be accepting the 1st FA assignment on thursday without penalty.

rich  

the schedule of presentations and practice talks has been added to the website. let me know asap if any of the times will not work for you.
rich  

the final programming project is nominally due thursday, and that would be a good time to turn it in, but i will accept them without penalty at the final exam on the 19th.
-rich  

Extend this Page   How to edit   Style   Subscribe   Notify   Suggest   Help   This open web page hosted at the University of Alberta.   Terms of use  4208/0