CMPUT 607: Reinforcement Learning in Artificial Intelligence

Topic	Slides	Homework	Rich's old slides
Introduction	Chapter 1	All excercises from the book
Evaluative feedback	Chapter 2	Exercises 2.17, 2.20, 2.21, 2.23, 2.27 and programming exercise 2.28 from here	Chapter 2
The RL problem (MDPs, value functions, optimality)	Chapter 3	Homework 3 Due date changed to Sep. 25	Chapter 3
Dynamic programming + linear programming	Chapter 4	Homework 4 Due date is Oct. 2	Chapter 4
Monte Carlo methods	Chapter 5	Exercises 5.7 and 5.8 from here. Due: Oct. 4	Chapter 5
Temporal-Difference learning	Chapter 6	Homework 6. Due date is Oct. 9	Chapter 6
Eligibility traces	Chapter 7	Homework 7. Due date is Oct. 16	Chapter 7
Generalization and function approximation	Chapter 8	Programming Exercise 8.8 of Homework 8. Due date is Nov 1	Chapter 8
Planning and learning, Prioritized Sweeping, dimensions of RL	Chapter 9		Chapter 9
Policy gradient and actor critic (Reading Material)	Slides	Homework Due. Nov. 20
Least squares methods	Chapter 13:) (NEW, Dec. 10)	Homework (NEW) Due. Nov. 29
Hiearchical RL
			Source files (powerpoint) as a tar archive (Jan 06)

Topic

Slides

Homework

Rich's old slides

Introduction

Chapter 1

All excercises from the book

Evaluative feedback

Chapter 2

Exercises 2.17, 2.20, 2.21, 2.23, 2.27 and programming exercise 2.28 from here

Chapter 2

The RL problem (MDPs, value functions, optimality)

Chapter 3

Homework 3 Due date changed to Sep. 25

Chapter 3

Dynamic programming + linear programming

Chapter 4

Homework 4 Due date is Oct. 2

Chapter 4

Monte Carlo methods

Chapter 5

Exercises 5.7 and 5.8 from here. Due: Oct. 4

Chapter 5

Temporal-Difference learning

Chapter 6

Homework 6. Due date is Oct. 9

Chapter 6

Eligibility traces

Chapter 7

Homework 7. Due date is Oct. 16

Chapter 7

Generalization and function approximation

Chapter 8

Programming Exercise 8.8 of
Homework 8. Due date is Nov 1

Chapter 8

Planning and learning, Prioritized Sweeping, dimensions of RL

Chapter 9

Policy gradient and actor critic (Reading Material)

Slides

Homework Due. Nov. 20

Least squares methods

Chapter 13:) (NEW, Dec. 10)

Homework (NEW) Due. Nov. 29

Hiearchical RL

Source files (powerpoint) as a tar archive (Jan 06)

Some Related Courses Elsewhere

About the programming assignments:
- There are no requirements on what programming language to use.
- You have to send me your program in e-mail with a subject in the following format:
"CMPUT607"<space><SID><space><ENUM>
Here <space> is a single space character, <SID> is your student id, <ENUM> is the number of the exercise you are submitting your code for (e.g., 2.28)
Csaba Csaba, Thu Sep 13 19:16:13 2007

Hi,
Some people asked what the regret is so I clarify it here:

The regret is the difference of what you could have gained if you selected the optimal arm and what you actually gained.

See my slides on this topic (first few pages):
http://www.cs.ualberta.ca/~szepesva/CMPUT654/lectures/lecture_ucb.pdf
or the paper by Auer et al. introducing UCB:
http://www.cs.ualberta.ca/~szepesva/CMPUT654/auer-finite-02.pdf

- Csaba Csaba, Mon Sep 17 18:33:42 2007

Hi,
I have posted the homework for Chapter 8 (this is a programming assignment).
I suggest you to start early, you will need to run quite a few of experiments and then consolidate the results into a nice report.
You can learn for the midterm while the experiments are running:)
Bests,
Csaba Csaba Szepesvari, Fri Oct 19 23:46:02 2007

Csaba Szepesvari, Fri Oct 19 23:46:02 2007

Question that I recieved: "I wonder how to choose the number of features? For example, if we use the RBFs as the approximator, how to choose the suitable number of RBFs to approximate state-action value function?
Furthermore, how to decide the centers and the widths of the RBFs? Is that arbitrary? In broad or narrow generalization, we should choose a large or small standard deviation, are there some rules to give us the precise quantity?"

How to choose the number of features:
A large number of features is bad since then it might take too much time too tune them. For too reasons: Many parameters and because of the problem of narrow valleys that we talked about in the class.
A small number of features is bad since then you cannot capture the important aspects of the value function(s).
Given an infinite amount of samples, the more features you would have the better would be the final performance (assuming that you decrease the step-size).
Practical advice: Look at relevant papers on the same subject and follow what they did. This is good if you have a limited amount of time and thus no chance to explore the problem fully. An added bonus is that you can compare your results with those of the paper(s).
Regarding what width is considered as giving you broad or narrow generalization: Narrow would be when the RBFs barely cross each other. Broad would be the case when the RBFs are strongly overlapping.
One purpose of the exercise is to force you to think seriously about these issues. Hence I won't tell you anything more! Be sensible! Csaba Szepesvari, Wed Oct 24 21:23:01 2007

Csaba Szepesvari, Wed Oct 24 21:23:01 2007

Several people requested an extension for the latest programming assignment.
I decided to give you this.
However, nothing is free, so there will be penalties for late submissions.
The rule is this:
Only the most recent submissions will only count.
(You cannot say: oops, that I did not want to submit and pls revert to my previous submission.)
You can submit parts of the solution.
You have only one chance to submit what you want to submit (you cannot submit in pieces).
If you submit on day X from now on (X is a random variable, why not?), then for the submitted part of the solution you will get a score of the full score times F[X], where
F[1] = 0.9
F[2] = 0.8
F[3] = 0.7
F[4] = 0.6
F[5] = 0.5
F[i] = 0, i>5.
For example, if you submit tomorrow (Friday) and your solution is perfect, you will get 90% of the full mark.
I hope that the rules are clear.
Bests,
Csaba

Whatever part is missing of what you submitted, you can still submit the missing
Sub
If you submit your code/solution by m Csaba, Thu Nov 1 21:20:44 2007

you can slip your soln beneath my door or wait until Monday.
the date when you submit in e-mail will count.
the hardcopy solution must be the exact printout of what you submitted in e-mail.
Csaba Csaba, Sat Nov 3 09:57:43 2007

Final Exam: Time, Place, and More

1) The final exam will be in NRE 2-127 on Dec. 12th at 2pm.

2) Exam will be comprehensive.

3) I will hold an office hour on Monday from 3pm to 4pm, so stop by if you have any question. My office is CS 3-55. Mohammad Ghavamzadeh, Thu Dec 6 23:02:31 2007

Mohammad Ghavamzadeh, Thu Dec 6 23:02:31 2007

	Reinforcement Learning and Artificial Intelligence (RLAI)
	CMPUT 607: Reinforcement Learning in Artificial Intelligence

CMPUT 607: Reinforcement Learning in Artificial Intelligence

Topics and slides

Some Related Courses Elsewhere