||Reinforcement Learning and
Edited by Mark Ring
The ambition of this
page is to keep track of some of the things going on in the Intrinsic
Motivation meeting at the University of Alberta. Research continues, but this meeting is currently dormant.
There is an intrinsic motivation mailing list hosted at Rutgers
University. To get on the list, send email to
firstname.lastname@example.org and give a good reason for being on it, such as
that you are actively working on these ideas on an aibo or
We have the playroom code now, so contact anna@cs if you want it.
The plan from last week for this
meeting was to do the survey paper on exploration by sebastian
thrun. but now rich is going to present K1 (successor to IDBD) in
MLRG on the same day. So i propose we do just the K1 paper for
monday, at MLRG, making that the intrinsic motivation meeting for the
day. The thrun paper would be postponed to next week.
Rich would like to present a few slides
on K1, the
successor to IDBD. A paper on this topic is available here
We have to do an experiment and get some results. Gather
your thoughts and submit ideas.
We would probably have Satinder in our meeting.
- Lets define a problem and get results about
- What exactly "Learning feels good." mean ?
- Still maximizing the total rewards? YES
- Learn how to explore
- Is intrinsic motivation a subset of exploration ? Maybe, we are
- Thrun's Technical Report in 92 might have some ideas about this
- Alborz will look into that more.
- You can get the paper from here.
David presented a new method
for exploration. Slides are available here.
David will work on this new idea and test them in an
There was a brainstorming session
about Intrinsic Motivation. The main subjects considered on the
- Maximizing long-term reward
- Learn to predict & control salient events
- Setting & achieving your own subgoals
- Explore as in learning spatial layout
- Learn about surprising or novel events
- Explore as in e-greedy, "exploration bonus"
- Finding a move compact/simpler representation/explanation
- Sense of accomplishment
to control, and predict as leads to control
- some things are more important just because
- linked to things of primitive interest
- achieve things, give control over thing
Satinder will video-conference in and
show a video of aibo learning at the option level. We will also
go through Michael Littman's slides from the DARPA site visit at
Rutgers. We should also figure out what to tell him to
characterize what we are up to.
Mark joined us by iChat
and we discussed the layout for a
large grid world, which we intend to use to explore exploring. Click here
to see a cleaned up version
of what we had on the board.
Rich suggested we get started working
on intrinsic motivation for exploration, balanced against other goals,
in the big gridworld environment. Brian suggested we use the
playroom world from the paper by by Satinder, Andy and Nuttapong which
we discussed the previous week. We discussed the playroom
paper again, trying to understand exactly how the playroom
worked and what the paper demonstrated. We would like to test how
preferring options compares to intrinsic
reward, and test the IDBD idea.
We had a specific proposal -- could you get similar results to the
playroom paper if you left out all the curiosity stuff and used a
simpler exploration encouraging method, such as optimistic
initialization of the option values? It is possible that most of
the effect comes simply from providing just the right options and then
preferring to take options over actions whenever possible.
Perhaps we should get the playroom/curiosity code and compare with the
playroom paper's results.
We discussed the playroom paper
paper by Satinder, Andy and Nuttapong. We would like to get the
playroom code and try our own stuff on it, particularly using the
step-size parameter of IDBD
reward for salient states, rather than the error. Using the error does
not guard against being rewarded for being utterly confused, whereas
IDBD's alpha will decrease if learning is not improving.