CMPUT 499/609

	Reinforcement Learning and Artificial Intelligence (RLAI)
	CMPUT 499/609: Reinforcement Learning in Artificial Intelligence

The ambition of this web page is to be the official, central site for information, software, and handouts for CMPUT 499/609 (a course at the University of Alberta in Winter 2007). There are also pointers here to slides for the course and to related courses elsewhere. New stuff is in red.

If you are taking the course in any capacity, please subscribe to this web page by clicking the "subscribe" link at the bottom of the page. Then you will be kept apprised of announcements related to the course.

You can add comments or questions to all web pages with an "Extend" link at the bottom of the page. If you click the notify box you might get a timely answer.

Thought questions should be submitted in hardcopy in class on the day they are due. Send them by email to sutton@cs.ualberta.ca only if hardcopy submission on that day is not possible for you.

Written exercises should be submitted in paper at the beginning of class on the day they are due.

See the course schedule for the list of exercises to be completed for each chapter and generally for the assignment due at each class. If there is a due date in an programming exercise that differs from that given in the schedule, believe the schedule.

Pretty soon, if you are at all mathematically inclined, you will want to read the Ross chapter below.

Here is the presentation schedule, including practice talks:

April 4, 11:00 Renee practice. consult Mohammad at mgh@cs.ualberta.ca
April 4, 11:40 Hussam practice. consult Mohammad at mgh@cs.ualberta.ca
April 5, 11:55 Renee presentation in class
April 5, 2:30 Wei Wei practice. consult Brian at btanner@cs.ualberta.ca
April 5, 3:10 Yasin practice. consult Brian at btanner@cs.ualberta.ca
April 10, 11:00- Wei Wei, Hussam, and Yasin presentation in class
April 10, 2:30 Varun practice. consult Adam at awhite@cs.ualberta.ca
April 11, 11:00 Vlad practice. consult David at silver@cs.ualberta.ca
April 11, 11:40 Andrew practice. consult David at silver@cs.ualberta.ca
April 12, 11:00 Varun, Vlad, and Andrew presentation in class

Course description, basic info on the course
Tentative Schedule
Online html version of the textbook
Scanned pdf version of the textbook (CogNet, UofA access only)
Thought questions
The Party Problem (Programming Assignment)
The Blackjack Programming Assignment
Function Approximation Assignments
Proof of the futility of discounting when using function approximation
Python information
Additional exercise for Chaper 2
RL-Library and RL-Glue- a standard for connecting RL agents and environments
Worldwide reinforcement learning research
Chapter II of Ross (1983)
LSTD(lambda) by Boyan (Reading)
FAQ (please refer to this before sending questions to the TA)
Refurbished versions of the pseudocode algorithms in Figures 8.8 and 8.9 are here for Sarsa(lambda) with FA and here for Watkins's Q(lambda) with FA.
The reinforcement learning hypothesis
The value function hypothesis
Graphing software in python
The RLtoolkit python package
pdf version of chapter 1 of the textbook
Jeopardy Quiz #1
marks

Slides

Web viewable (somewhat old):

Source files (powerpoint) as a tar archive (Jan 06)

Some Related Courses Elsewhere

Peter Stone's course at the University of Texas at Austin

a student's ideas on exercise 7.3

When writing the pseudocode for question 2.5, please keep it simple. There should be no need to explain data structures or implementation details of that type. For examples of the level of detail expected for this question, refer to examples of pseudocode in the book, particularly Figure 4.1 (p.92) and Figure 5.1 (p.113).

To give you a sense of how your assignment will be graded, here's a likely mark distribution for this exercise:
2.1: 2pts.
2.5: 8pts.
2.55: 28pts.
2.8: 2pts. (extra credit) TA Brian, Mon Jan 16 16:03:06 2006

Here is the mark distribution for the Chapter 3 exercieses,

Questions 3.5 & 3.11 - 4 pts. each
Questions 3.9 & 3.15 - 5 pts. each
Questions 3.4, 3.10, & 3.17 - 6 pts. each
Question 3.8 - 8 pts.
Question 3.6 (Extra Credit) - 2 pts.

Here is the mark distribution for the Chapter 4 written exercises:

Exercise 4.1 - 6 pts.
Exercise 4.2 - 7 pts.
Exercise 4.3 - 9 pts.
Exercise 4.5 - 8 pts.
Exercise 4.9 - 4 pts. Brian (TA), Tue Jan 31 14:19:32 2006

Exercise 4.9 refers to equation (4.10), which should be interpretted as having two parts (really two equations). You should do both equations (2 pts for each). Rich, Tue Jan 31 15:51:36 2006

Regarding your RL-Glue assignment:

I encourage everyone to use the discussion and FAQ pages on /RLBB/top.html. This way you can extend the pages and ask me questions in a public way so that the whole class will benefit. Also, feel free to email me any questions or concerns regarding the "Glue".

Cheers,
Adam - awhite@cs.ualberta.ca Adam White, Thu Feb 9 13:57:07 2006

RL-Glue / RL-Library lecture slides: /RLL/classPres.pdf

Cheers,
Adam Adam White, Thu Feb 9 14:29:22 2006

The mark distribution for the Chapter 5 written exercises is:

Exercise 5.1 - 3 pts.
Exercise 5.2 - 4 pts.
Exercise 5.5 - 4 pts. Brian, Fri Feb 10 14:18:00 2006

The mark distribution for the written exercises of chapter 7 is as follows:
Exercise 7.2 - 3 pts.
Exercise 7.6 - 6 pts. Brian, Thu Mar 9 08:40:10 2006

Here's the mark distribution of the Chapter 8 written exercises:
Exercise 8.1 - 3 pts.
Exercise 8.2 - 4 pts.
Exercise 8.6 - 3 pts.
Exercise 8.7 - 2 pts. Brian, Fri Mar 17 14:26:42 2006

For those of you who chose to use the Quickgraph software for the function approximation assignment, note that there is a bug in the program. On line 300 of graph3d.py, it currently says:

wzmings = window[2][1]

But it should really say:

wzmaxgs = window[2][1]

since wzmings is set properly on the line before it and wzmaxgs is not being set. You may need to fix this yourself. Sorry for the inconvenience.

Thanks to David Thue for finding this bug and the fix. Brian, Mon Mar 20 14:16:09 2006

For the mountain car assignment, don't try to do anything fancy in implementing eligibility traces. Don't try to keep track of which traces are significantly non-zero and decay only them. Just pick the memory size reasonably small so that it doesn't take forever to do the trace decay step.

This assignment is challenging to complete in a week. In this course i don't usually get as far as a complete program for RL including function approximation and traces. It is an important, non-trivial step. Congratulations. rich, Tue Mar 21 13:55:20 2006

The distribution of points for the Chapter 9 written exercises:

Exercise 9.1 - 4 pts.
Exercise 9.2 - 3 pts.
Exercise 9.3 - 3 pts.
Exercise 9.5 - 6 pts.
Exercise 9.6 - 2 pts. (Extra Credit) Brian, Tue Mar 28 16:29:30 2006

For those of us who don't have a hard copy of the text book yet: the exercises for chapter 2 have the same numbers in the book as in the online version. There is one small difference: online, in ex. 2.5, it should be alpha = (1/k) instead of (alpha = 1) / k. Renee, Thu Jan 11 12:37:51 2007

For the thought questions:

Note that the thought questions should be sent to sutton@cs.ualberta.ca, not anna@cs.ualberta.ca as it used to say on this page.

To help me find your thought questions in my inbox, please put "[thought]" in your message title.
s
-RS rich sutton, Thu Jan 11 13:10:17 2007

Regarding the textbook:

Apparently, and contrary to what i said in class, there are no copies of the textbook in the bookstore.

Apparently, there is a pdf scanned copy of the book on the internet available to UofA students. Varun will send the URL. rich sutton, Thu Jan 11 13:16:22 2007

You can access the online version of the textbook through the www.library.ualberta.ca website. Search for "reinforcement learning sutton", and it should be the first link. Follow the "UA Internet Access" link. Varun, Thu Jan 11 13:31:30 2007

[there is now a direct link in the menu above]

Lots of changes to the course web page and to the assignments for thursday. (aren't you glad you subscribed?)
-rs rich sutton, Tue Jan 16 18:41:37 2007

if you have not already done so, i would appreciate it if you officially registered for the course by the end of tomorrow, which i think is the last day you can. if you are a student in a department other than CS, then to do this you must see edith drummond in the CS dept office on the 2nd floor of athabasca hall before 4pm.
thanks,
rich rich sutton, Thu Jan 18 21:58:16 2007

The assignments due on thursday, january 18, have been changed, partly due to our being a little bit behind, but more so that we can spend more time on chapter 3, which is a little long and critical to the course. See the schedule for the details, but basically chapter 2 exercises are pushed to next tuesday and a followup to the jeopardy quiz (see highlighted menu item below) has been added for this thursday in its place. If you don't see this note in time and do these in the opposite (more original) order that is fine.

Other events in the schedule are pushed back accordingly.

Note that thought questions are due on thursday for chapters 1 and 2 if you have not sent them to me before. Although we are switching to hardcopy submission for the future, you do not have to resubmit in hardcopy whal you have already submitted in email.

Please note that exercise 2.55 can be found in the menu below.

it is important to do well on the chapter 3 exercises. to make sure that everybody can do this, i'd like to have another class on chapter 3 before requiring the exercises to be completed. so, the chapter 3 exercises will be due on thursday, one class later than previously scheduled. i would still like to try to have the first programming assignment due on that day as well, but we will have to see how it goes.

-rich rich sutton, Thu Jan 25 22:29:41 2007

On page 61, the upper limit in Equation 3.3 should be T-t-1. Anonymous, Tue Jan 30 15:13:28 2007

Please note that only the first party-problem programming assignment is due on tuesday the 6th. The second part is due a week later.

Thanks to Andrew for pointing out that this was not indicated correctly on the web site. It has now been corrected. The course schedule is always your most reliable guide.

rich rich sutton, Thu Feb 1 13:39:33 2007

I have reworked the schedule so that the assignment for chapter is due on tuesday the 13th, the next programming assignment due the 15th, and so on.

To complete our discussion of state, and make sure you got it, here is a little puzzler, a micro-assignment for next class:

State Assignment:

Consider a world of a single grid cell one side of which is open (if you face this side the sensation is 0; otherwise it is 1). There are three actions: turn right, turn left, and spin. Spin changes your orientation to a random direction, 25% to each of the four.

How many states are there (in the Monad sense) and what are they?

Turn your answers in on paper for thursday's class.

dear class,

only two people got the state mini-assignment correct: vlad and elliot (sorry andrew b., your number was correct but not your explanation). so i'd like everybody else to try it again. here's a hint: the correct answer is greater than 5.

turn it in on tuesday, with the chapter 4 exercises.

rich Anonymous, Thu Feb 8 14:16:57 2007

Hi all,

For those of you that wish to write python agents and/or environments with RL-Glue, I have added 3 new projects to RL-Library. They illustrate using:

A Python agent with a C environment
A Python environment with a C agent
A Python agent with a Python environment

See the Project Shelf of RL-Library (/RLR/project.html).

Also, please note RL-Glue cannot be downloaded from RL-Library. The library contains agents, environments, experiments and Projects only! Download RL-Glue from Sourceforge (see: http://sourceforge.net/projects/rl-glue)

Cheers, Adam White, Wed Mar 14 13:47:48 2007

In the book the player hits automatically until the player has a sum of 12 or greater. Below is a modified environment so you only have to learn a policy from a sum of 12 or greater, following the description in the book.

http://www.cs.ualberta.ca/~butcher/Blackjack.cpp Anonymous, Sun Mar 18 16:53:59 2007

in class we thought april 10 was the last day of class, but actually it is the 12th. this affects the schedule for the students presentations, which will now be on the 10th and 12th, in the same order as previously planned. -rich

i've made some changes to the schedule. we will do more FA on thursday and i am moving the rest later by one class. i'll be accepting the 1st FA assignment on thursday without penalty.

rich Anonymous, Tue Mar 27 15:00:22 2007

the schedule of presentations and practice talks has been added to the website. let me know asap if any of the times will not work for you.
rich Anonymous, Mon Apr 2 22:21:04 2007

the final programming project is nominally due thursday, and that would be a good time to turn it in, but i will accept them without penalty at the final exam on the 19th.
-rich Anonymous, Tue Apr 10 12:43:32 2007

Extend this Page How to edit Style Subscribe Notify Suggest Help This open web page hosted at the University of Alberta. Terms of use 4208/0