Reinforcement Learning and
Artificial
Intelligence (RLAI)
CMPUT
499/609: Reinforcement Learning in Artificial Intelligence
The ambition
of this web
page is to be the official, central site for information, software, and
handouts
for CMPUT 499/609 (a course at the University of Alberta in Spring
2006). There
are also pointers here to slides for the course and to related courses
elsewhere. If you are taking the course in
any capacity, please subscribe to
this web page by clicking the "subscribe" link at the bottom of the
page. Then you will be kept apprised of announcements related to
the course.
You can add comments or
questions to all web pages with an "Extend" link at
the bottom of the page. If you
click the notify box you might get
a timely answer.
For the time being, thought questions should be emailed to
anna@cs.ualberta.ca.
Written exercises should be submitted in
paper to Brian Booth at the beginning of class on the day they are
due. Please submit these in duplicate - an original and a
photocopy. The original will be marked and returned to you.
The online exercise for Chapter 2 have been
updated to hopefully save you some work. Also, when writing the
pseudocode for question 2.5, please keep it simple. There should be no
need to explain data structures or implementation details of that type.
For examples of the level of detail expected for this question, refer
to examples of pseudocode in the book, particularly Figure 4.1 (p.92)
and Figure 5.1 (p.113).
To give you a sense of how your assignment will be graded, here's a
likely mark distribution for this exercise:
2.1: 2pts.
2.5: 8pts. 2.55: 28pts.
2.8: 2pts. (extra credit)
The description of the party problem is now
online. Hopefully, it's
described just the way you remember it. Also, here is the mark
distribution for the Chapter 3 exercieses,
Finally,
handouts of the solutions to the exercises from chapters 2 & 3 will
be made available at the next lecture for all those who turned in
answers. Until then, keep fit and have fun.
Hi all,
The description of the
second programming assignment is now online (on the party problem
page). If the description is unclear, let me know. Also, here is the
mark distribution for the Chapter 4 written exercises:
Exercise 4.9 refers to equation (4.10),
which should be interpretted as
having two parts (really two equations). You should do both
equations (2 pts for each).
Beginning on Tuesday Feb 7, CMPUT 499/609
will meet in CSC-B43.
cu there
Due to the tendency of UofA students to
party all night (depending on what you call a party) the due date for
the DP/Party programming assignment is extended until thursday,
February 9th.
Your marks for the written & programming
exercises are now online (see link on course page). The marks are
posted by assigned 'Mark ID' since I don't have people's student IDs.
You will find your Mark ID written on your first programming assignment
when I return it to you in class today.
Regarding your RL-Glue assignment:
I have updated the code on the website. This was to correct a couple
things in the build script for linux users. PLEASE GET THE NEW VERSION
I encourage everyone to use the discussion and FAQ pages on
/RLBB/top.html.
This way you can extend the
pages and ask me questions in a public way so that the whole class will
benefit. Also, feel free to email me any questions or concerns
regarding the "Glue".
Cheers,
Adam - awhite@cs.ualberta.ca
In case you missed the RL-Glue / RL-Library
lecture, you can get my slides here: /RLL/classPres.pdf
Cheers,
Adam
The Blackjack programming assignment
description is now available on the webpage. If it's not clear, please
let me know. Also, the mark distribution for the Chapter 5 written
exercises is:
The Blackjack environment is now available
on the RL-library environment page.
Cheers,
Adam
Note that I made a mistake with the mark
distribution of the Chapter 5
written exercises. You should be answering exercise 5.1, 5.2, and 5.5,
not exercise 5.3. Sorry about that.
The MC programming assignment link does not
seem to be working.
The MC programming assignment link should be
working now.
According to the new requirement in the
programming assignment that each card is limited in number, it seems
that now a state is defined by the dealer's shown card, and the number
of each cards in deck, rather than simply 3 variables: player's sum,
useable ace, dealer's shown card. If that is true then the state
space has too many dimensions and impossible to graph it like
that in the text.
The description of the modified blackjack
was a little ambiguous on this point, but the state space remains the
same - you do not need to keep track of which cards have been dealt
because they are dealt with replacement.
I've re-worded the problem description so that this should now be
crystal clear. Please speak up again if it's not.
The cat mouse environment is now available
on the environment shelf of the RL-Library. Please feel free to point
out any bugs or inefficiencies in the code as I converted it fairly
quickly this morning.
Cheers,
Adam
I found a bug in the Blackjack code that may
have been effecting the performance on some systems. Unpdated code can
be found in the RL-Library on the environment shelf.
Hi all,
The midterm has been marked and the marks are posted online. You will
have a chance to view your midterm in today's class, but because of the
amount of material that remains to be covered in the course, we won't
be discussing it in detail. If you have any questions about the
midterm, or how it was marked, please come see me outside of class time.
A couple of updates about future assignments:
(a) The Sarsa programming assignment (cat and mouse) has been
cancelled. However, there will be a future Function Approximation
assignment that will involve TD methods (it's description will be up
soon-ish).
(b) The mark distribution for the written exercises of chapter 7 (due
Tuesday) is as follows:
Exercise 7.2 - 3 pts.
Exercise 7.6 - 6 pts.
Hey all,
First off, the first function approximation programming assignment is
now online.
As
for the marking of the backup diagrams on the midterm, let me first
explain my reasoning. In the RL framework we've been looking at, we
don't consider actions independently. It doesn't really make sense to
ask how good is an action. Instead, we look at actions from a given
state: state-action pairs. The same holds for the Bellman equations and
update rules. It doesn't make sense to talk about the value of an
action on its own. Backup diagrams are a way of visually representing
these equations. The hope is that, with one, the other can be quickly
determined. So backup diagrams should also refer to state-action pairs
instead of actions.
Rich and I discussed this, and though he
understood my reasoning, he thought I was being too harsh given that it
was an exam environment. He also said I should lighten up and stop
watching Sean Penn movies and Law & Order.
So in conclusion:
(a) Rich is right.
(b) I've made my point
(c) Your midterm marks have been corrected
Hey all,
The description of the
second function approximation assignment is now available online (on
the same page as the description of the first function approximation
assignment). If the description is unclear, please let me know. The
assignment is due March 28th.
Also, here's the mark distribution of the Chapter 8 written exercises:
Exercise 8.1 - 3 pts.
Exercise 8.2 - 4 pts.
Exercise 8.6 - 3 pts.
Exercise 8.7 - 2 pts.
For those of you who chose to use the
Quickgraph software for the
function approximation assignment, note that there is a bug in the
program. On line 300 of graph3d.py, it currently says:
wzmings = window[2][1]
But it should really say:
wzmaxgs = window[2][1]
since
wzmings is set properly on the line before it and wzmaxgs is not being
set. You may need to fix this yourself. Sorry for the
inconvenience.
Thanks to David Thue for finding this bug and the fix.
For the mountain car assignment, don't try
to do anything fancy in implementing eligibility traces. Don't
try to keep track of which traces are significantly non-zero and decay
only them. Just pick the memory size reasonably small so that it
doesn't take forever to do the trace decay step.
This
assignment is challenging to complete in a week. In this course i
don't usually get as far as a complete program for RL including
function approximation and traces. It is an important,
non-trivial step. Congratulations.
Hi, here's a paper related to Curtis'
comment on GPU matrix algorithms
http://graphics.stanford.edu/papers/gpumatrixmult/gpumatrixmult.pdf
-zhi
Hey all,
Just a quick note to give you the distribution of points for the
Chapter 9 written exercises (the last ones, yay):
this is a heads up that there will be a
short reading assignment with
thought questions due on tuesday, a week from today, the same day the
mini-project proposal is due. see this space for the url to the
reading.
Just a reminder to those in CMPUT 609 that if you haven't submitted
your mini-project proposals to Rich or I yet, make sure you do so
before the end of the day. Thanks
Hey all,
The final exam marks are now online. If you wish to view/discuss your
exam, you can come visit me in my office sometime before 5pm next
Wednesday (April 26). Thanks,
To give you a sense of how your assignment will be graded, here's a likely mark distribution for this exercise:
2.1: 2pts.
2.5: 8pts.
2.55: 28pts.
2.8: 2pts. (extra credit)