Reinforcement Learning and
Artificial
Intelligence (RLAI)
CMPUT
609: Reinforcement
Learning for Artificial Intelligence
The
ambition
of this web
page is to be the official, central site for information, software, and
handouts
for the 2010 version of CMPUT 609 (a course at the University of
Alberta). There
are also pointers here to slides for the course and to related courses
elsewhere. For up-to-date course materials, see Sutton's home page, or go directly here.
The marking of the written assignments will be done by Mohammad Shafiei
(shafieik@cs.ualberta.ca), so you must get an electronic copy to him by
class time the day the assignment is due. Assignments must
be turned in on time because
answer sheets will be handed out on the day the assignment is
due. There
is
a scanner in CSC B-23 that
can be used, and that is accessible to all graduate students. If
your answers are already in pdf form then you can email them directly
to mohammad.
See the web page for the book and
check out the "errata/notes"
page.
Reading
diary entries should be emailed to
sutton@cs.ualberta.ca. He will read them and send back a crytic
marking, points out of four. It
is
no longer necessary to bring a hardcopy of your entry to class on
the
day it is due.
Exercise
2.5
asks for pseudocode, meaning a way of writing the code that makes
the algorithmic ideas clear without presuming any particular language,
and without worrying about all the details. There should be no
need to explain data structures or implementation details of that type.
For examples of the kind of pseudocode we are looking for, see
that used in later chapters, such as in Figures 4.1 (page 92) or 6.9
(page 146). In addition, here is a template for the answer to this
question:
Repeat for a = 1 to n:
; initialization
...
Repeat forever:
a = ...
r = bandit(a)
... Here
is the mark distribution for the exercises:
a good survey paper on the neuroscience of
reinforcement learning can be found here:
http://dx.doi.org/10.2976/1.2732246.
Here is the problem that is due next Tuesday.
A man is standing on a queue waiting for service, with N people ahead
of him. He knows the utility of waiting out the queue, r, and the
probability p that a person will be served in unit time. On the other
hand, he incurs a cost of c for every unit of time spent waiting. The
problem is to determine his waiting policy if he wishes to maximize his
expected return.
Of course, if he quits the queue, he will not be served.
The problem is to set up this as an MDP (states, actions, rewards),
write the Bellman optimality equation and then solve it to identify the
optimal policy.
If you have questions, do not hesitate to ask me.
Here is the next problem, due to next
Thursday.
A driver is looking for parking on the way to his destination. Each
parking place is free with probability p independently of whether other
parking places are free or not. The driver cannot observe whether a
parking place is free until he reaches it. If he parks k places from
his destination, he incurs a cost k. If he reaches the destination
without having parked the cost is C. The goal of the driver is to
minimize the expected cost of parking.
Note that the driver knows p and C and how far he is from his
destination. Help the driver by determining an optimal policy. For
this, write the problem as an MDP, determine the Bellman optimality
equation and then find the optimal policy.
Here is the problem that is due the coming
Thursday.
Assume that we are given a normed vector space V.
Let T: V -> V be an L_T-Lipschitz operator, and let S: V -> V be
an L_S-Lipschitz operator.
Prove that the composition of T and S is an (L_T L_S)-Lipschitz
operator.
Repeat for a = 1 to n: ; initialization
...
Repeat forever:
a = ...
r = bandit(a)
...
Here is the mark distribution for the exercises:
2.1: 2pts.
2.5: 8pts.
2.55: 28pts.
2.8: 2pts. (extra credit)
be sure to answer every sub-question in every exercise - that's how the marking is done.
Questions 3.5, 3.8, & 3.11 - 4 pts. each
Questions 3.9 - 5 pts.
Questions 3.10 & 3.17 - 6 pts. each
Exercise 4.1 - 6 pts.
Exercise 4.2 - 7 pts.
Exercise 4.3 - 9 pts.
Exercise 4.5 - 8 pts.
Exercise 4.9 - 4 pts.
Exercise 5.1 - 3 pts.