RLAI Reinforcement Learning and Artificial Intelligence (RLAI)

CMPUT 607: Reinforcement Learning in Artificial Intelligence

The ambition of this web page is to be the official, central site for information, software, and handouts for CMPUT 607 (a course at the University of Alberta in Fall 2007).  There are also pointers here to slides for the course and to related courses elsewhere.  New stuff is in red.
If you are taking the course in any capacity, please subscribe to this web page by clicking the "subscribe" link at the bottom of the page.  Then you will be kept apprised of announcements related to the course.
You can add comments or questions to all web pages with an "Extend" link at the bottom of the page.  If you click the notify box you might get a timely answer.
Written exercises should be submitted in paper at the beginning of class on the day they are due.
See the course schedule for the list of exercises to be completed for each chapter and generally for the assignment due at each class.  If there is a due date in an programming exercise that differs from that given in the schedule, believe the schedule..

Topics and slides

Rich's old slides
Chapter 1 All excercises from the book
Evaluative feedback
Chapter 2 Exercises 2.17, 2.20, 2.21, 2.23, 2.27 and programming exercise 2.28 from here
Chapter 2
The RL problem (MDPs, value functions, optimality)
Chapter 3
Homework 3 Due date changed to Sep. 25
Chapter 3
Dynamic programming + linear programming
Chapter 4
Homework 4 Due date is Oct. 2
Chapter 4
Monte Carlo methods
Chapter 5 Exercises 5.7 and 5.8 from here.   Due: Oct. 4
Chapter 5
Temporal-Difference learning
Chapter 6
Homework 6. Due date  is Oct. 9
Chapter 6
Eligibility traces
Chapter 7
Homework 7. Due date is Oct. 16
Chapter 7
Generalization and function approximation
Chapter 8 
Programming Exercise 8.8 of
Homework 8. Due date is Nov 1
Chapter 8
Planning and learning, Prioritized Sweeping, dimensions of RL
Chapter 9
Chapter 9
Policy gradient and actor critic (Reading Material)
Homework Due. Nov. 20

Least squares methods
Chapter 13:) (NEW, Dec. 10)
Homework (NEW) Due. Nov. 29
Hiearchical RL

Source files (powerpoint) as a tar archive (Jan 06)

Some Related Courses Elsewhere

About the programming assignments:
- There are no requirements on what programming language to use.
- You have to send me your program in e-mail with a subject in the following format:
Here <space> is a single space character, <SID> is your student id, <ENUM> is the number of the exercise you are submitting your code for (e.g., 2.28)

Some people asked what the regret is so I clarify it here:

The regret is the difference of what you could have gained if you selected the optimal arm and what you actually gained.

See my slides on this topic (first few pages):
or the paper by Auer et al. introducing UCB:

- Csaba  

I have posted the homework for Chapter 8 (this is a programming assignment).
I suggest you to start early, you will need to run quite a few of experiments and then consolidate the results into a nice report.
You can learn for the midterm while the experiments are running:)

Question that I recieved: "I wonder how to choose the number of features? For example, if we use the RBFs as the approximator, how to choose the suitable number of RBFs to approximate state-action value function?
  Furthermore, how to decide the centers and the widths of the RBFs? Is that arbitrary? In broad or narrow generalization, we should choose a large or small standard deviation, are there some rules to give us the precise quantity?"

How to choose the number of features:
A large number of features is bad since then it might take too much time too tune them. For too reasons: Many parameters and because of the problem of narrow valleys that we talked about in the class.
A small number of features is bad since then you cannot capture the important aspects of the value function(s).
Given an infinite amount of samples, the more features you would have the better would be the final performance (assuming that you decrease the step-size).
Practical advice: Look at relevant papers on the same subject and follow what they did. This is good if you have a limited amount of time and thus no chance to explore the problem fully. An added bonus is that you can compare your results with those of the paper(s).
Regarding what width is considered as giving you broad or narrow generalization: Narrow would be when the RBFs barely cross each other. Broad would be the case when the RBFs are strongly overlapping.
One purpose of the exercise is to force you to think seriously about these issues. Hence I won't tell you anything more! Be sensible!  

Several people requested an extension for the latest programming assignment.
I decided to give you this.
However, nothing is free, so there will be penalties for late submissions.
The rule is this:
Only the most recent submissions will only count.
(You cannot say: oops, that I did not want to submit and pls revert to my previous submission.)
You can submit parts of the solution.
You have only one chance to submit what you want to submit (you cannot submit in pieces).
If you submit on day X from now on (X is a random variable, why not?), then for the submitted part of the solution you will get a score of the full score times F[X], where
F[1] = 0.9
F[2] = 0.8
F[3] = 0.7
F[4] = 0.6
F[5] = 0.5
F[i] = 0, i>5.
For example, if you submit tomorrow (Friday) and your solution is perfect, you will get 90% of the full mark.
I hope that the rules are clear.

Whatever part is missing of what you submitted, you can still submit the missing
If you submit your code/solution by m  

you can slip your soln beneath my door or wait until Monday.
the date when you submit in e-mail will count.
the hardcopy solution must be the exact printout of what you submitted in e-mail.

Final Exam: Time, Place, and More

1) The final exam will be in NRE 2-127 on Dec. 12th at 2pm.

2) Exam will be comprehensive.

3) I will hold an office hour on Monday from 3pm to 4pm, so stop by if you have any question. My office is CS 3-55.  

Extend this Page   How to edit   Style   Subscribe   Notify   Suggest   Help   This open web page hosted at the University of Alberta.   Terms of use  6291/31