CMPUT 499/609

	Reinforcement Learning and Artificial Intelligence (RLAI)
	CMPUT 499/609: Reinforcement Learning in Artificial Intelligence

The ambition of this web page is to be the official, central site for information, software, and handouts for CMPUT 499/609 (a course at the University of Alberta in Spring 2006). There are also pointers here to slides for the course and to related courses elsewhere.

If you are taking the course in any capacity, please subscribe to this web page by clicking the "subscribe" link at the bottom of the page. Then you will be kept apprised of announcements related to the course.

You can add comments or questions to all web pages with an "Extend" link at the bottom of the page. If you click the notify box you might get a timely answer.

For the time being, thought questions should be emailed to anna@cs.ualberta.ca.

Written exercises should be submitted in paper to Brian Booth at the beginning of class on the day they are due. Please submit these in duplicate - an original and a photocopy. The original will be marked and returned to you.

Course description, basic info on the course
Online Marks Posting (updated April 26, 2006)
Schedule
Online version of the textbook
Thought questions
The Party Problem (Programming Assignment)
The Monte Carlo Blackjack Programming Assignment (new, February 10, 2006, revised v slightly Feb 13)
Function Approximation Assignments (new: March 9, 2006, updated: March 17, 2006)
Proof of the futility of discounting when using function approximation
Python information
Additional exercise for Chaper 2
RL-Library and RL-Glue- a standard for connecting RL agents and environments
Worldwide reinforcement learning research
Chapter II of Ross (1983)
mini-project information (new March 28)
LSTD(lambda) by Boyan (Reading for Mar 23)
FAQ (please refer to this before sending questions to the TA)
Refurbished versions of the pseudocode algorithms in Figures 8.8 and 8.9 are here for Sarsa(lambda) with FA and here for Watkins's Q(lambda) with FA.
The reinforcement learning hypothesis
The value function hypothesis
Graphing software in python
The RLtoolkit python package

Slides

Web viewable (slightly old):

Source files (powerpoint) as a tar archive (Jan 06)

Related Courses Elsewhere

Peter Stone's course at the University of Texas at Austin

a student's ideas on exercise 7.3

The online exercise for Chapter 2 have been updated to hopefully save you some work. Also, when writing the pseudocode for question 2.5, please keep it simple. There should be no need to explain data structures or implementation details of that type. For examples of the level of detail expected for this question, refer to examples of pseudocode in the book, particularly Figure 4.1 (p.92) and Figure 5.1 (p.113).

To give you a sense of how your assignment will be graded, here's a likely mark distribution for this exercise:
2.1: 2pts.
2.5: 8pts.
2.55: 28pts.
2.8: 2pts. (extra credit) TA Brian, Mon Jan 16 16:03:06 2006

The description of the party problem is now online. Hopefully, it's described just the way you remember it. Also, here is the mark distribution for the Chapter 3 exercieses,

Questions 3.5 & 3.11 - 4 pts. each
Questions 3.9 & 3.15 - 5 pts. each
Questions 3.4, 3.10, & 3.17 - 6 pts. each
Question 3.8 - 8 pts.
Question 3.6 (Extra Credit) - 2 pts.

Finally, handouts of the solutions to the exercises from chapters 2 & 3 will be made available at the next lecture for all those who turned in answers. Until then, keep fit and have fun.

Hi all,

The description of the second programming assignment is now online (on the party problem page). If the description is unclear, let me know. Also, here is the mark distribution for the Chapter 4 written exercises:

Exercise 4.1 - 6 pts.
Exercise 4.2 - 7 pts.
Exercise 4.3 - 9 pts.
Exercise 4.5 - 8 pts.
Exercise 4.9 - 4 pts. Brian (TA), Tue Jan 31 14:19:32 2006

Exercise 4.9 refers to equation (4.10), which should be interpretted as having two parts (really two equations). You should do both equations (2 pts for each). Rich, Tue Jan 31 15:51:36 2006

Beginning on Tuesday Feb 7, CMPUT 499/609 will meet in CSC-B43.
cu there rich, Fri Feb 3 16:48:44 2006

Due to the tendency of UofA students to party all night (depending on what you call a party) the due date for the DP/Party programming assignment is extended until thursday, February 9th. Rich Sutton, Mon Feb 6 14:22:57 2006

Your marks for the written & programming exercises are now online (see link on course page). The marks are posted by assigned 'Mark ID' since I don't have people's student IDs. You will find your Mark ID written on your first programming assignment when I return it to you in class today. Brian, Tue Feb 7 08:53:56 2006

Regarding your RL-Glue assignment:

I have updated the code on the website. This was to correct a couple things in the build script for linux users. PLEASE GET THE NEW VERSION

I encourage everyone to use the discussion and FAQ pages on /RLBB/top.html. This way you can extend the pages and ask me questions in a public way so that the whole class will benefit. Also, feel free to email me any questions or concerns regarding the "Glue".

Cheers,
Adam - awhite@cs.ualberta.ca Adam White, Thu Feb 9 13:57:07 2006

In case you missed the RL-Glue / RL-Library lecture, you can get my slides here: /RLL/classPres.pdf

Cheers,
Adam Adam White, Thu Feb 9 14:29:22 2006

The Blackjack programming assignment description is now available on the webpage. If it's not clear, please let me know. Also, the mark distribution for the Chapter 5 written exercises is:

Exercise 5.1 - 3 pts.
Exercise 5.2 - 4 pts.
Exercise 5.5 - 4 pts. Brian, Fri Feb 10 14:18:00 2006

The Blackjack environment is now available on the RL-library environment page.

Cheers,
Adam Anonymous, Fri Feb 10 14:55:10 2006

Note that I made a mistake with the mark distribution of the Chapter 5 written exercises. You should be answering exercise 5.1, 5.2, and 5.5, not exercise 5.3. Sorry about that. Brian, Sat Feb 11 10:21:09 2006

The MC programming assignment link does not seem to be working. Jo, Mon Feb 13 13:56:33 2006

The MC programming assignment link should be working now. Brian , Mon Feb 13 14:12:30 2006

According to the new requirement in the programming assignment that each card is limited in number, it seems that now a state is defined by the dealer's shown card, and the number of each cards in deck, rather than simply 3 variables: player's sum, useable ace, dealer's shown card. If that is true then the state space has too many dimensions and impossible to graph it like that in the text. zhi, Mon Feb 13 23:58:42 2006

The description of the modified blackjack was a little ambiguous on this point, but the state space remains the same - you do not need to keep track of which cards have been dealt because they are dealt with replacement.

I've re-worded the problem description so that this should now be crystal clear. Please speak up again if it's not. Rich, Tue Feb 14 00:46:55 2006

The cat mouse environment is now available on the environment shelf of the RL-Library. Please feel free to point out any bugs or inefficiencies in the code as I converted it fairly quickly this morning.

Cheers,
Adam Anonymous, Tue Feb 14 12:17:15 2006

I found a bug in the Blackjack code that may have been effecting the performance on some systems. Unpdated code can be found in the RL-Library on the environment shelf. Adam, Tue Feb 14 17:52:38 2006

Hi all,

The midterm has been marked and the marks are posted online. You will have a chance to view your midterm in today's class, but because of the amount of material that remains to be covered in the course, we won't be discussing it in detail. If you have any questions about the midterm, or how it was marked, please come see me outside of class time.

A couple of updates about future assignments:

(a) The Sarsa programming assignment (cat and mouse) has been cancelled. However, there will be a future Function Approximation assignment that will involve TD methods (it's description will be up soon-ish).

(b) The mark distribution for the written exercises of chapter 7 (due Tuesday) is as follows:
Exercise 7.2 - 3 pts.
Exercise 7.6 - 6 pts. Brian, Thu Mar 9 08:40:10 2006

Hey all,

First off, the first function approximation programming assignment is now online.

As for the marking of the backup diagrams on the midterm, let me first explain my reasoning. In the RL framework we've been looking at, we don't consider actions independently. It doesn't really make sense to ask how good is an action. Instead, we look at actions from a given state: state-action pairs. The same holds for the Bellman equations and update rules. It doesn't make sense to talk about the value of an action on its own. Backup diagrams are a way of visually representing these equations. The hope is that, with one, the other can be quickly determined. So backup diagrams should also refer to state-action pairs instead of actions.

Rich and I discussed this, and though he understood my reasoning, he thought I was being too harsh given that it was an exam environment. He also said I should lighten up and stop watching Sean Penn movies and Law & Order.

So in conclusion:
(a) Rich is right.
(b) I've made my point
(c) Your midterm marks have been corrected Brian, Thu Mar 9 20:15:16 2006

Hey all,

The description of the second function approximation assignment is now available online (on the same page as the description of the first function approximation assignment). If the description is unclear, please let me know. The assignment is due March 28th.

Also, here's the mark distribution of the Chapter 8 written exercises:
Exercise 8.1 - 3 pts.
Exercise 8.2 - 4 pts.
Exercise 8.6 - 3 pts.
Exercise 8.7 - 2 pts. Brian, Fri Mar 17 14:26:42 2006

For those of you who chose to use the Quickgraph software for the function approximation assignment, note that there is a bug in the program. On line 300 of graph3d.py, it currently says:

wzmings = window[2][1]

But it should really say:

wzmaxgs = window[2][1]

since wzmings is set properly on the line before it and wzmaxgs is not being set. You may need to fix this yourself. Sorry for the inconvenience.

Thanks to David Thue for finding this bug and the fix. Brian, Mon Mar 20 14:16:09 2006

For the mountain car assignment, don't try to do anything fancy in implementing eligibility traces. Don't try to keep track of which traces are significantly non-zero and decay only them. Just pick the memory size reasonably small so that it doesn't take forever to do the trace decay step.

This assignment is challenging to complete in a week. In this course i don't usually get as far as a complete program for RL including function approximation and traces. It is an important, non-trivial step. Congratulations. rich, Tue Mar 21 13:55:20 2006

Hi, here's a paper related to Curtis' comment on GPU matrix algorithms
http://graphics.stanford.edu/papers/gpumatrixmult/gpumatrixmult.pdf

-zhi Anonymous, Thu Mar 23 17:47:49 2006

Hey all,

Just a quick note to give you the distribution of points for the Chapter 9 written exercises (the last ones, yay):

Exercise 9.1 - 4 pts.
Exercise 9.2 - 3 pts.
Exercise 9.3 - 3 pts.
Exercise 9.5 - 6 pts.
Exercise 9.6 - 2 pts. (Extra Credit) Brian, Tue Mar 28 16:29:30 2006

this is a heads up that there will be a short reading assignment with thought questions due on tuesday, a week from today, the same day the mini-project proposal is due. see this space for the url to the reading. rich, Tue Mar 28 17:23:14 2006

the reading for tuesday is http://www.cs.ualberta.ca/~bowling/papers/01ijcai.pdf Anonymous, Tue Mar 28 17:28:53 2006

Hey,

Just a reminder to those in CMPUT 609 that if you haven't submitted your mini-project proposals to Rich or I yet, make sure you do so before the end of the day. Thanks Brian, Tue Apr 4 13:34:06 2006

Hey all,

The final exam marks are now online. If you wish to view/discuss your exam, you can come visit me in my office sometime before 5pm next Wednesday (April 26). Thanks, Brian, Thu Apr 20 17:47:22 2006

Extend this Page How to edit Style Subscribe Notify Suggest Help This open web page hosted at the University of Alberta. Terms of use 7176/1