Reinforcement Learning and
Artificial
Intelligence (RLAI)
CMPUT
499/609: Reinforcement Learning in Artificial Intelligence
The ambition
of this web
page is to be the official, central site for information, software, and
handouts
for CMPUT 499/609 (a course at the University of Alberta in Winter
2007). There
are also pointers here to slides for the course and to related courses
elsewhere. New stuff is in red. If
you are taking the course in
any capacity, please subscribe to
this web page by clicking the "subscribe" link at the bottom of the
page. Then you will be kept apprised of announcements related to
the course.
You can add comments or
questions to all web pages with an "Extend" link at
the bottom of the page. If you
click the notify box you might get
a timely answer. Thought
questions should be submitted in hardcopy in class on the day they are
due. Send them by email to sutton@cs.ualberta.ca only if hardcopy
submission on that day is
not possible for you.
Written exercises should be submitted in
paper at the beginning of class on the day they are
due.
See the course schedule for the
list of exercises to be completed for
each chapter and generally for the assignment due at each class.
If there is a due date in an programming exercise that differs from
that given in the schedule, believe the schedule.
Pretty soon, if you are at all
mathematically inclined, you will want to read the Ross chapter below.
Here is the presentation schedule,
including practice talks:
April 4, 11:00 Renee
practice. consult Mohammad at mgh@cs.ualberta.ca
April 4, 11:40 Hussam
practice. consult Mohammad at mgh@cs.ualberta.ca
April 5, 11:55 Renee
presentation in class
April 5, 2:30 Wei Wei
practice. consult Brian at btanner@cs.ualberta.ca
April 5, 3:10 Yasin
practice. consult Brian at btanner@cs.ualberta.ca
April 10, 11:00- Wei Wei,
Hussam, and Yasin presentation in class
April 10, 2:30 Varun
practice. consult Adam at awhite@cs.ualberta.ca
April 11, 11:00 Vlad
practice. consult David at silver@cs.ualberta.ca
April 11, 11:40 Andrew
practice. consult David at silver@cs.ualberta.ca
April 12, 11:00 Varun, Vlad, and Andrew presentation in class
When writing the
pseudocode for question 2.5, please keep it simple. There should be no
need to explain data structures or implementation details of that type.
For examples of the level of detail expected for this question, refer
to examples of pseudocode in the book, particularly Figure 4.1 (p.92)
and Figure 5.1 (p.113).
To give you a sense of how your assignment will be graded, here's a
likely mark distribution for this exercise:
2.1: 2pts.
2.5: 8pts. 2.55: 28pts.
2.8: 2pts. (extra credit)
Here is the mark
distribution for the Chapter 3 exercieses,
Exercise 4.9 refers to equation (4.10),
which should be interpretted as
having two parts (really two equations). You should do both
equations (2 pts for each).
Regarding your RL-Glue assignment:
I encourage everyone to use the discussion and FAQ pages on
/RLBB/top.html.
This way you can extend the
pages and ask me questions in a public way so that the whole class will
benefit. Also, feel free to email me any questions or concerns
regarding the "Glue".
The mark distribution for the written
exercises of chapter 7 is as follows:
Exercise 7.2 - 3 pts.
Exercise 7.6 - 6 pts.
Here's the mark distribution of the Chapter 8 written exercises:
Exercise 8.1 - 3 pts.
Exercise 8.2 - 4 pts.
Exercise 8.6 - 3 pts.
Exercise 8.7 - 2 pts.
For those of you who chose to use the
Quickgraph software for the
function approximation assignment, note that there is a bug in the
program. On line 300 of graph3d.py, it currently says:
wzmings = window[2][1]
But it should really say:
wzmaxgs = window[2][1]
since
wzmings is set properly on the line before it and wzmaxgs is not being
set. You may need to fix this yourself. Sorry for the
inconvenience.
Thanks to David Thue for finding this bug and the fix.
For the mountain car assignment, don't try
to do anything fancy in implementing eligibility traces. Don't
try to keep track of which traces are significantly non-zero and decay
only them. Just pick the memory size reasonably small so that it
doesn't take forever to do the trace decay step.
This
assignment is challenging to complete in a week. In this course i
don't usually get as far as a complete program for RL including
function approximation and traces. It is an important,
non-trivial step. Congratulations.
The distribution of points for the
Chapter 9 written exercises:
For those of us who don't have a hard copy
of the text book yet: the
exercises for chapter 2 have the same numbers in the book as in the
online version. There is one small difference: online, in ex. 2.5, it
should be alpha = (1/k) instead of (alpha = 1) / k.
For the thought questions:
Note that the thought questions should be sent to
sutton@cs.ualberta.ca, not anna@cs.ualberta.ca as it used to say on
this page.
To help me find your thought questions in my inbox, please put
"[thought]" in your message title.
s
-RS
Regarding the textbook:
Apparently, and contrary to what i said in class, there are no copies
of the textbook in the bookstore.
Apparently, there is a pdf scanned copy of the book on the internet
available to UofA students. Varun will send the URL.
You can access the online version of the
textbook through the www.library.ualberta.ca website. Search for
"reinforcement learning sutton", and it should be the first link.
Follow the "UA Internet Access" link.
[there is now a direct link in the menu above]
Lots of changes to the course web page and
to the assignments for thursday. (aren't you glad you subscribed?)
-rs
if you have not already done so, i would
appreciate it if you officially registered for the course by the end of
tomorrow, which i think is the last day you can. if you are a
student in a department other than CS, then to do this you must see
edith drummond in the CS dept office on the 2nd floor of athabasca hall
before 4pm.
thanks,
rich
The
assignments due on thursday, january 18, have been changed, partly due
to our being a little bit behind, but more so that we can spend more
time on chapter 3, which is a little long and critical to
the course. See the schedule for the details, but basically chapter 2
exercises are pushed to next tuesday and a followup to the jeopardy
quiz (see highlighted menu item below) has been added for this thursday
in its place. If you don't see this note in time and do these in
the opposite (more original) order that is fine.
Other events in the schedule are pushed back accordingly.
Note that thought questions are due on thursday for chapters 1 and 2 if
you have not sent them to me before. Although we are switching to
hardcopy submission for the future, you do not have to resubmit in
hardcopy whal you have already submitted in email.
Please note that exercise 2.55 can be found in the menu below.
it is important to do well on the chapter 3
exercises. to make
sure that everybody can do this, i'd like to have another class on
chapter 3 before requiring the exercises to be completed. so, the
chapter 3 exercises will be due on thursday, one class later than
previously scheduled. i would still like to try to have the first
programming assignment due on that day as well, but we will have to see
how it goes.
-rich
On page 61, the upper limit in Equation 3.3
should be T-t-1.
Please note that only the first
party-problem programming assignment is due on tuesday the 6th. The
second part is due a week later.
Thanks
to Andrew for pointing out that this was not indicated correctly on the
web site. It has now been corrected. The course schedule is
always your most reliable guide.
rich
I have reworked the schedule so
that the assignment for chapter is due on tuesday the 13th, the next
programming assignment due the 15th, and so on.
To complete our discussion of state, and make sure you got it, here is
a little puzzler, a micro-assignment for next class:
State Assignment:
Consider a world of a single grid cell one side of which is open (if
you face this side the sensation is 0; otherwise it is 1). There are
three actions: turn right, turn left, and spin. Spin changes your
orientation to a random direction, 25% to each of the four.
How many states are there (in the Monad sense) and what are they?
Turn your answers in on paper for thursday's class.
dear class,
only two people got
the state mini-assignment correct: vlad and elliot (sorry andrew b.,
your number was correct but not your explanation). so i'd like
everybody else to try it again. here's a hint: the correct answer is
greater than 5.
turn it in on tuesday, with the chapter 4 exercises.
rich
Hi all,
For those of you that wish
to write python agents and/or environments with RL-Glue, I have added 3
new projects to RL-Library. They illustrate using:
A Python agent with a C environment
A Python environment with a C agent
A Python agent with a Python environment
See the Project Shelf of RL-Library
(/RLR/project.html).
Also,
please note RL-Glue cannot be downloaded from RL-Library. The library
contains agents, environments, experiments and Projects only! Download
RL-Glue from Sourceforge (see: http://sourceforge.net/projects/rl-glue)
Cheers,
In the book the player hits automatically
until the player has a sum of
12 or greater. Below is a modified environment so you only have
to learn a policy from a sum of 12 or greater, following the
description in the book.
http://www.cs.ualberta.ca/~butcher/Blackjack.cpp
in class we thought april 10 was
the last day of class, but actually it is the 12th. this affects the
schedule for the students presentations, which will now be on the 10th
and 12th, in the same order as previously planned. -rich
i've made some changes to the schedule.
we will do more FA on thursday and i am moving the rest later by
one class. i'll be accepting the 1st FA assignment on thursday
without penalty.
rich
the schedule of presentations and practice
talks has been added to the
website. let me know asap if any of the times will not work for you.
rich
the final programming project is nominally
due thursday, and that would
be a good time to turn it in, but i will accept them without penalty at
the final exam on the 19th.
-rich
To give you a sense of how your assignment will be graded, here's a likely mark distribution for this exercise:
2.1: 2pts.
2.5: 8pts.
2.55: 28pts.
2.8: 2pts. (extra credit)