Reinforcement Learning and
Artificial
Intelligence (RLAI)
CMPUT
609: Reinforcement
Learning for Artificial Intelligence
The
ambition
of this web
page is to be the official, central site for information, software, and
handouts
for CMPUT 609 (a course at the University of Alberta in Winter
2009). There
are also pointers here to slides for the course and to related courses
elsewhere. See the schedule for assignments.
See the web page for the book and check out the "errata/notes"
page.
If you are taking
the course in
any capacity, please subscribe to
this web page by clicking the "subscribe" link at the bottom of the
page. Then you will be kept apprised of announcements related to
the course.
Reading
diary entries should be emailed to
sutton@cs.ualberta.ca. He will read them and send back a crytic
marking, points out of four.
Written exercises should be submitted in
paper to Mohammad Shafiei at the beginning of class on the day they are
due. Only one copy need be submitted. It will be marked and
returned to you. Assignments must be turned in on time because
answer sheets will be handed out on the day the assignment is due.
You can add
comments or
questions to all web pages with an "Extend" link at
the bottom of the page.If
you
click the notify box you might get
a timely answer.
Exercise
2.5 asks for pseudocode, meaning a way of writing the code that makes
the algorithmic ideas clear without presuming any particular language,
and without worrying about all the details. There should be no
need to explain data structures or implementation details of that type.
For examples of the kind of pseudocode we are looking for, see
that used in later chapters, such as in Figures 4.1 (page 92) or 6.9
(page 146). In addition, here is a template for the answer to this
question:
Repeat for a = 1 to n:
; initialization
...
Repeat forever:
a = ...
r = bandit(a)
... Here
is the mark distribution for the exercises:
a good survey paper on the neuroscience of
reinforcement learning can be found here:
http://dx.doi.org/10.2976/1.2732246.
Dear class,
I am writing to provide a little more guidance on the class projects.
The truth is that I'm not exactly sure how best to structure them, or
how much to structure them. I want you to gain some experience
programming and using the learning algorithms. But I'd don't want to
provide too much structure and inhibit your creativity, and prevent you
from a trying things that interest you. We will have to find a balance.
I don't know what is the best way to structure the projects, but I do
know some of the mistakes that you might make. So let me at least
explain some of those, and we'll see if you can avoid them.
The most common mistake is to try something too ambitious. Using
reinforcement learning algorithms is actually a bit more difficult than
it probably seems at this point. There are parameters, and there are
many issues in the setup of a problem, in the choice of reward, and
there are many many issues in representations and choice of state
space. The importance of all these things is understated in the book,
and is thus probably not very apparent. Perhaps because of this, a
common mistake is to start a project that is too ambitious, that
addresses some large problem involving many AI issues. In this case,
the project is unlikely to have a successful outcome in the available
time.
To deal with the risk of this mistake, I have urged you to think of the
first project in particular as a simple one, to be done quickly.
Further, you may want to think of the first project is being an
application or illustration of what we have learned about in class and
in the book, whereas the second project you might want to think of as
going beyond that to have just a little bit of research content.
(learning to do research is one of the main things you have to do in
graduate school.) So, a good choice for first project could be
something like just writing an agent for mountain-car, or Acrobot, say
using function approximation with tile coating. That would be
conceptually straightforward and would start giving some experience. In
the past have assigned exactly that (mountain car w/tile coding) as a
project and viewed it as the final successful achievement of the
course. So getting that far should not be viewed as a small thing. It
would be a good outcome. (And in the second project, we should only be
trying to go a little bit farther.) Another nice thing about a very
straightforward project like that is it would be easy for me to
understand your results. It'll make your write up almost trivial to
write, and for me to read. Thinking a little more generally, you might
take any problem from the RL library, or RL competition, describe and
implement an experiment with it, and that could be your first project.
There are also simple things to do with the critterbot log files.
for example, you could take one or a few of the signals and treat
it as the reward. then you could use the others to try to predict
it at various time scales (that is, for various values of gamma).
this would illustrate learning with TD(lambda). then you
could compare performance with MC or for various values of lambda.
The main thing I'm looking for in the first project is to see you
implement a reinforcement learning algorithm, and actually try it and
get it to work on a problem. You could find versions of the algorithms
on the web and in the library, but I'd like to see that you understand
enough to write your own agent, or at least be able to modify an
existing agent. So if you do work with something that you found, it
would be good if you modified it somewhat, perhaps by changing from one
algorithm to another, say from Q-learning to sarsa, or to a different
kind of exploration. One exception is tile coding. It would be okay to
reimplement a variation of top coating, but probably it would be better
to use the existing code, and show me that you know enough to use it
effectively, which is itself a valuable skill. it is easy to
under-estimate how much skill and experience you need to use
tile-coding up to its full powers.
The second most common mistake is to not leave enough time to get good
and experimental results. Doing a good experiment, getting meaningful
results, is not easy. It is harder than you think. It always seems to
be the case that when the time comes to submit a paper or a project,
that we would always really like to run the experiments all over again
one more time, maybe to make the results more statistically
significant, or maybe to get rid of one annoying detail that you rather
have done a little bit differently. Or to run one more comparison to
better understand what happened. I put a big emphasis on clarity of
results. It is much better to have a small clear result than a vaguer
result on a large problem, where it is not quite clear what it means or
why it happened.
The above concerns perhaps are most relevant to the second project,
where you might do a little research. It is important for the first
project also that it be clear. It should be a clear application or
experience with reinforcement learning. It should be easy to explain,
to write up, and to read. But for the second project particularly, if
it is a little bit of research, then you will want to be very clear in
its conclusions.
In the past, when I've asked a class to do a project involving a little
bit of research, the most common and important mistake has been not to
have a clear hypothesis -- not to have a clear question being asked and
tested by the experimental research. Formulation of hypotheses is
something that we should understand completely if we see ourselves as
computer scientists -- that is, as scientists. A scientific experiment
should have a clear scientific hypothesis. An experiment should clearly
test or bear on the hypothesis. But the first step (hypothesis
formulation) is often very difficult for students. Formulating the
hypothesis is a difficult thing. Not difficult because it's hard like
lifting a heavy rock, but hard because you have to think clearly, hard
because you have to give up on all the possible things you might do and
pick one question to clearly ask. It's much easier to ask or wonder,
"what would happen if I applied reinforcement learning to some large
problem?" Or ask "what if I try to make this idea work?" Or "I
wonder what would happen if I'd tried so and so?" Instead, you have to
ask something small and precise. Like, "in problem A, is sarsa better
than q-learning" or "does the best step size scale linearly or
quadratically with the number of features in a state representation?"
Or "in off policy learning on the critterbot log file data, do we see
instances where TD lambda diverges, but the new gradient TD algorithms
converge?" Or "on this problem, which of these two kinds of prioritized
sweeping are more effective?"
It is also possible to do a small research project without those kinds
of very targeted small hypotheses. One can do a research project that
is more system building, or experience with system building. It is
perfectly okay to have a question, if you are working with the
critterbot for example, like "what can I learn about reinforcement
learning and robotics by trying to get the critterbot to move
efficiently in a straight line without bumping into things?" Or "what
can I discover by data mining in the critterbot log files?" These would
be fine. I guess these are okay because, although the results will not
be completely clear, it will be of interest. There is a recognition
that working with real systems is important, that it is in some sense
an ultimate goal, and that any real progress with it, even if small, is
useful. But the need for clarity remains paramount. You have to be
objective about your experience, not over claim, and say whatever small
things you can report back from your experience.
Okay, that's a little more guidance about the projects. The bottom line
is that you are to learn something, to gain some experience using
algorithms, and to produce something that will let me see that, and
that lets me see that you are thinking clearly. Let us discuss this
more in class today. Particularly the issues of working in teams, and
of presentations in class. See you there,
Rich
p.s. if you want to respond to the class about this or anything, go to
the course web page, go to the bottom, and click "extend this page".
that is what i did to send this.
By the luck of the draw, the following will be the presentation order. Each grouping is meant to be a team, which would make a single presentation of approximately ten minutes. Let me know if any of the team groupings is not correct.
Thursday, April 2:
Ashique Mahmood Shahin Jabbari Arfaee
nolan bard michael johanson
Orsten Sterling
Yifeng Liu Hengshuai Yao
Andrew Neitsch
Tuesday, April 7:
Richard Gibson Nick Abou Risk
Reihaneh Rabbany AmirAli Sharifi Levi Lelis
Martha White
Michael Delp Shahab Jabbari Arfaee?
Yaoliang Yu Yongjian Zhang
i will consider requests to change days, but i'd like to keep things somewhat balanced across the two days. (i can't move everybody to the second day.) It is ok to present when you have not completed your project all the way. you should try to reflect on your experience and try to pass on what you have learned to the class.
target your presentation to be brief. briefly state the specific problem or problems you worked on, and what algorithms and algorithm variations you tried. Ideally, you would identify a research question at three levels: 1) a general informal level - here is something that you might have wondered as a lay person; 2) a level that could be described as a formal scientific hypothesis that could in principle be tested by experiment or analysis, and 3) a very specific level in the form of the specific experiment you did, the particular parameter values used and what the outcomes could be or could have been. then tell us what happened and what you learned.
If you want to use a computer to make your presentation, the best way is probably to bring your own with any needed adaptor. Or you could use mine - I have a mac with powerpoint and keynote and pdf presenting software. email the presentation to me in advance in this case so i can try it out first.
Repeat for a = 1 to n: ; initialization
...
Repeat forever:
a = ...
r = bandit(a)
...
Here is the mark distribution for the exercises:
2.1: 2pts.
2.5: 8pts.
2.55: 28pts.
2.8: 2pts. (extra credit)
be sure to answer every sub-question in every exercise - that's how the marking is done.
Questions 3.5, 3.8, & 3.11 - 4 pts. each
Questions 3.9 - 5 pts.
Questions 3.10 & 3.17 - 6 pts. each
Exercise 4.1 - 6 pts.
Exercise 4.2 - 7 pts.
Exercise 4.3 - 9 pts.
Exercise 4.5 - 8 pts.
Exercise 4.9 - 4 pts.
Exercise 5.1 - 3 pts.