Reinforcement Learning and Artificial Intelligence (RLAI) CMPUT 609: Reinforcement Learning for Artificial Intelligence

The ambition of this web page is to be the official, central site for information, software, and handouts for CMPUT 609 (a course at the University of Alberta in Winter 2009).  There are also pointers here to slides for the course and to related courses elsewhere.

See the schedule for assignments.

See the web page for the book and check out the "errata/notes" page.

If you are taking the course in any capacity, please subscribe to this web page by clicking the "subscribe" link at the bottom of the page.  Then you will be kept apprised of announcements related to the course.

Reading diary entries should be emailed to sutton@cs.ualberta.ca. He will read them and send back a crytic marking, points out of four.

Written exercises should be submitted in paper to Mohammad Shafiei at the beginning of class on the day they are due.  Only one copy need be submitted.  It will be marked and returned to you.  Assignments must be turned in on time because answer sheets will be handed out on the day the assignment is due.

### Slides

Web viewable (slightly old):
Source files (powerpoint) as a tar archive (Jan 06)

## A Few of the Related Courses Elsewhere

Exercise 2.5 asks for pseudocode, meaning a way of writing the code that makes the algorithmic ideas clear without presuming any particular language, and without worrying about all the details.  There should be no need to explain data structures or implementation details of that type.  For examples of the kind of pseudocode we are looking for, see that used in later chapters, such as in Figures 4.1 (page 92) or 6.9 (page 146). In addition, here is a template for the answer to this question:

Repeat for a = 1 to n:        ; initialization
...

Repeat forever:
a = ...
r = bandit(a)
...

Here is the mark distribution for the exercises:

2.1:  2pts.
2.5:  8pts.
2.55: 28pts.
2.8:  2pts. (extra credit)

be sure to answer every sub-question in every exercise - that's how the marking is done.

Questions 3.5, 3.8, & 3.11    - 4 pts. each
Questions 3.9                     - 5 pts.
Questions 3.10 & 3.17          - 6 pts. each

Exercise 4.1 - 6 pts.
Exercise 4.2 - 7 pts.
Exercise 4.3 - 9 pts.
Exercise 4.5 - 8 pts.
Exercise 4.9 - 4 pts.

Exercise 5.1 - 3 pts.

Exercise 5.2 - 4 pts.
Exercise 5.5 - 4 pts.

Exercise 7.2 - 3 pts.
Exercise 7.6 - 6 pts.

Exercise 8.1 - 3 pts.
Exercise 8.2 - 4 pts.
Exercise 8.6 - 3 pts.
Exercise 8.7 - 2 pts.

Exercise 9.1 - 4 pts.
Exercise 9.2 - 3 pts.
Exercise 9.3 - 3 pts.
Exercise 9.5 - 6 pts.
Exercise 9.6 - 2 pts. (Extra Credit)

a good survey paper on the neuroscience of reinforcement learning can be found here: http://dx.doi.org/10.2976/1.2732246.

Dear class,

I am writing to provide a little more guidance on the class projects. The truth is that I'm not exactly sure how best to structure them, or how much to structure them. I want you to gain some experience programming and using the learning algorithms. But I'd don't want to provide too much structure and inhibit your creativity, and prevent you from a trying things that interest you. We will have to find a balance.

I don't know what is the best way to structure the projects, but I do know some of the mistakes that you might make. So let me at least explain some of those, and we'll see if you can avoid them.

The most common mistake is to try something too ambitious. Using reinforcement learning algorithms is actually a bit more difficult than it probably seems at this point. There are parameters, and there are many issues in the setup of a problem, in the choice of reward, and there are many many issues in representations and choice of state space. The importance of all these things is understated in the book, and is thus probably not very apparent. Perhaps because of this, a common mistake is to start a project that is too ambitious, that addresses some large problem involving many AI issues. In this case, the project is unlikely to have a successful outcome in the available time.

To deal with the risk of this mistake, I have urged you to think of the first project in particular as a simple one, to be done quickly. Further, you may want to think of the first project is being an application or illustration of what we have learned about in class and in the book, whereas the second project you might want to think of as going beyond that to have just a little bit of research content. (learning to do research is one of the main things you have to do in graduate school.) So, a good choice for first project could be something like just writing an agent for mountain-car, or Acrobot, say using function approximation with tile coating. That would be conceptually straightforward and would start giving some experience. In the past have assigned exactly that (mountain car w/tile coding) as a project and viewed it as the final successful achievement of the course. So getting that far should not be viewed as a small thing. It would be a good outcome. (And in the second project, we should only be trying to go a little bit farther.) Another nice thing about a very straightforward project like that is it would be easy for me to understand your results. It'll make your write up almost trivial to write, and for me to read. Thinking a little more generally, you might take any problem from the RL library, or RL competition, describe and implement an experiment with it, and that could be your first project.  There are also simple things to do with the critterbot log files.  for example, you could take one or a few of the signals and treat it as the reward.  then you could use the others to try to predict it at various time scales (that is, for various values of gamma).  this would illustrate learning with TD(lambda).  then you could compare performance with MC or for various values of lambda.

The main thing I'm looking for in the first project is to see you implement a reinforcement learning algorithm, and actually try it and get it to work on a problem. You could find versions of the algorithms on the web and in the library, but I'd like to see that you understand enough to write your own agent, or at least be able to modify an existing agent. So if you do work with something that you found, it would be good if you modified it somewhat, perhaps by changing from one algorithm to another, say from Q-learning to sarsa, or to a different kind of exploration. One exception is tile coding. It would be okay to reimplement a variation of top coating, but probably it would be better to use the existing code, and show me that you know enough to use it effectively, which is itself a valuable skill.  it is easy to under-estimate how much skill and experience you need to use tile-coding up to its full powers.

The second most common mistake is to not leave enough time to get good and experimental results. Doing a good experiment, getting meaningful results, is not easy. It is harder than you think. It always seems to be the case that when the time comes to submit a paper or a project, that we would always really like to run the experiments all over again one more time, maybe to make the results more statistically significant, or maybe to get rid of one annoying detail that you rather have done a little bit differently. Or to run one more comparison to better understand what happened. I put a big emphasis on clarity of results. It is much better to have a small clear result than a vaguer result on a large problem, where it is not quite clear what it means or why it happened.

The above concerns perhaps are most relevant to the second project, where you might do a little research. It is important for the first project also that it be clear. It should be a clear application or experience with reinforcement learning. It should be easy to explain, to write up, and to read. But for the second project particularly, if it is a little bit of research, then you will want to be very clear in its conclusions.

In the past, when I've asked a class to do a project involving a little bit of research, the most common and important mistake has been not to have a clear hypothesis -- not to have a clear question being asked and tested by the experimental research. Formulation of hypotheses is something that we should understand completely if we see ourselves as computer scientists -- that is, as scientists. A scientific experiment should have a clear scientific hypothesis. An experiment should clearly test or bear on the hypothesis. But the first step (hypothesis formulation) is often very difficult for students. Formulating the hypothesis is a difficult thing. Not difficult because it's hard like lifting a heavy rock, but hard because you have to think clearly, hard because you have to give up on all the possible things you might do and pick one question to clearly ask. It's much easier to ask or wonder, "what would happen if I applied reinforcement learning to some large problem?"  Or ask "what if I try to make this idea work?" Or "I wonder what would happen if I'd tried so and so?" Instead, you have to ask something small and precise. Like, "in problem A, is sarsa better than q-learning" or "does the best step size scale linearly or quadratically with the number of features in a state representation?" Or "in off policy learning on the critterbot log file data, do we see instances where TD lambda diverges, but the new gradient TD algorithms converge?" Or "on this problem, which of these two kinds of prioritized sweeping are more effective?"

It is also possible to do a small research project without those kinds of very targeted small hypotheses. One can do a research project that is more system building, or experience with system building. It is perfectly okay to have a question, if you are working with the critterbot for example, like "what can I learn about reinforcement learning and robotics by trying to get the critterbot to move efficiently in a straight line without bumping into things?" Or "what can I discover by data mining in the critterbot log files?" These would be fine. I guess these are okay because, although the results will not be completely clear, it will be of interest. There is a recognition that working with real systems is important, that it is in some sense an ultimate goal, and that any real progress with it, even if small, is useful. But the need for clarity remains paramount. You have to be objective about your experience, not over claim, and say whatever small things you can report back from your experience.

Okay, that's a little more guidance about the projects. The bottom line is that you are to learn something, to gain some experience using algorithms, and to produce something that will let me see that, and that lets me see that you are thinking clearly. Let us discuss this more in class today. Particularly the issues of working in teams, and of presentations in class. See you there,

Rich

p.s. if you want to respond to the class about this or anything, go to the course web page, go to the bottom, and click "extend this page".  that is what i did to send this.

By the luck of the draw, the following will be the presentation order.  Each grouping is meant to be a team, which would make a single presentation of approximately ten minutes.  Let me know if any of the team groupings is not correct.

Thursday, April 2:

Ashique Mahmood
Shahin Jabbari Arfaee

nolan bard
michael johanson

Orsten Sterling

Yifeng Liu
Hengshuai Yao

Andrew Neitsch

Tuesday, April 7:

Richard Gibson
Nick Abou Risk

Reihaneh Rabbany
AmirAli Sharifi
Levi Lelis

Martha White

Michael Delp
Shahab Jabbari Arfaee?

Yaoliang Yu
Yongjian Zhang

i will consider requests to change days, but i'd like to keep things somewhat balanced across the two days.  (i can't move everybody to the second day.)  It is ok to present when you have not completed your project all the way.  you should try to reflect on your experience and try to pass on what you have learned to the class.

target your presentation to be brief.  briefly state the specific problem or problems you worked on, and what algorithms and algorithm variations you tried. Ideally, you would identify a research question at three levels: 1) a general informal level - here is something that you might have wondered as a lay person; 2) a level that could be described as a formal scientific hypothesis that could in principle be tested by experiment or analysis, and 3) a very specific level in the form of the specific experiment you did, the particular parameter values used and what the outcomes could be or could have been. then tell us what happened and what you learned.

If you want to use a computer to make your presentation, the best way is probably to bring your own with any needed adaptor.  Or you could use mine - I have a mac with powerpoint and keynote and pdf presenting software.  email the presentation to me in advance in this case so i can try it out first.

rich