|
Reinforcement Learning and
Artificial
Intelligence (RLAI)
|
Intrinsic
Motivation
|
Edited by Mark Ring
The ambition of this
page is to keep track of some of the things going on in the Intrinsic
Motivation meeting at the University of Alberta. Research continues, but this meeting is currently dormant.
There is an intrinsic motivation mailing list hosted at Rutgers
University. To get on the list, send email to
mlittman@cs.rutgers.edu and give a good reason for being on it, such as
that you are actively working on these ideas on an aibo or
computational world.
We have the playroom code now, so contact anna@cs if you want it.
2004/12/6
-
The plan from last week for this
meeting was to do the survey paper on exploration by sebastian
thrun. but now rich is going to present K1 (successor to IDBD) in
MLRG on the same day. So i propose we do just the K1 paper for
monday, at MLRG, making that the intrinsic motivation meeting for the
day. The thrun paper would be postponed to next week.
2004/11/29
-
Rich would like to present a few slides
on K1, the
successor to IDBD. A paper on this topic is available
here.
We have to do an experiment and get some results. Gather
your thoughts and submit ideas.
We would probably have Satinder in our meeting.
- Lets define a problem and get results about
intrinsic motivation.
- What exactly "Learning feels good." mean ?
- Still maximizing the total rewards? YES
- Learn how to explore
- Is intrinsic motivation a subset of exploration ? Maybe, we are
not sure
- Thrun's Technical Report in 92 might have some ideas about this
issue.
- Alborz will look into that more.
- You can get the paper from here.
2004/11/22
-
David presented a new method
for exploration. Slides are available here.
David will work on this new idea and test them in an
empirical environment.
2004/11/15
-
There was a brainstorming session
about Intrinsic Motivation. The main subjects considered on the
session were:
- Maximizing long-term reward
- Learn to predict & control salient events
- Setting & achieving your own subgoals
- Explore as in learning spatial layout
- Learn about surprising or novel events
- Explore as in e-greedy, "exploration bonus"
- Finding a move compact/simpler representation/explanation
- Sense of accomplishment
=>
- Learning
to control, and predict as leads to control
- some things are more important just because
- linked to things of primitive interest
- achieve things, give control over thing
2004/11/08
-
Satinder will video-conference in and
show a video of aibo learning at the option level. We will also
go through Michael Littman's slides from the DARPA site visit at
Rutgers. We should also figure out what to tell him to
characterize what we are up to.
2004/11/01
-
Mark joined us by
iChat and we discussed the layout for a
large grid world, which we intend to use to explore exploring. Click
here to see a cleaned up version
of what we had on the board.
2004/10/25
-
Rich suggested we get started working
on intrinsic motivation for exploration, balanced against other goals,
in the big gridworld environment. Brian suggested we use the
playroom world from the paper by by Satinder, Andy and Nuttapong which
we discussed the previous week. We discussed the playroom
paper again, trying to understand exactly how the playroom
worked and what the paper demonstrated. We would like to test how
preferring options compares to intrinsic
reward, and test the IDBD idea.
We had a specific proposal -- could you get similar results to the
playroom paper if you left out all the curiosity stuff and used a
simpler exploration encouraging method, such as optimistic
initialization of the option values? It is possible that most of
the effect comes simply from providing just the right options and then
preferring to take options over actions whenever possible.
Perhaps we should get the playroom/curiosity code and compare with the
playroom paper's results.
2004/10/18 -
We discussed the
playroom paper
paper by Satinder, Andy and Nuttapong. We would like to get the
playroom code and try our own stuff on it, particularly using the
step-size parameter of
IDBD as the
reward for salient states, rather than the error. Using the error does
not guard against being rewarded for being utterly confused, whereas
IDBD's alpha will decrease if learning is not improving.
Today we had a two-hour meeting on intrinsic motivation with the primary question appearing to be "Is there more to intrinsic motivation than merely the desire to maximize long-term reward?" A number of ideas for motivating the agent to explore, to learn to predict, to learn to control its sensation, etc., etc. have been brain-stormed. Rich made a list of behaviors/tasks such as "learn to predict and control salient events", "learn about surprising or novel events", etc.
I would like to discuss (i) whether the question "Is there really intrinsic reward/motivation and if so then what is it?" makes sense and (ii) whether it actually matters at all. I think the fundamental question which we can never avoid is "What is the grand objective of an AI agent?" This seems to be a very important and foundational question in AI. RL answers this question as "The objective is to maximize the long-term scalar reward signal supplied by the environment". Thus, if the reward signal is provided then it is pretty cut and dry and it is exactly maximizing it that measures everything up. For instance: Does learning feel good? Well, only as long as it helps maximizing returns. Does gaining control over agent sensations feels good? Well, only if it helps maximizing the returns, and so on. Maximizing long-term reward and only it makes anything useful.
Thus, it seems to me that in the RL framework, the questions of existence of the intrinsic reward and its definition are really in the eye of the beholder. It appears similar to Rich's answer to "What is the goal of an agent?" --- "It is whatever helps us (the observers) to predict the agent's behavior". We, the observers, attribute a goal to an agent. By the same token, I would like to propose that intrinsic motivation is something that we, the observers, can attribute to the agent if it helps us explain/predict its behavior. Thus, the question of whether the agent is intrinsically motivated is somewhat meaningless.
Summary:
1. if we are observing an agent then _we_ attribute intrinsic motivation to it to help us explain/predict its behavior;
2. if we are building an RL agent and already have a reward function for it then introducing intrinsic motivation is similar to giving it shaping rewards, providing an initial Q function, creating a state abstraction, engineering options, etc. All of these are _means_ to make maximizing the "real" long-term reward faster and more efficiently;
3. so, perhaps, instead of asking whether intrinsic motivation "really" exists and what it is, we should ask what we can add to the agent's learning mechanism to make the agent maximize its reward faster and more efficiently. That is the only goal an RL agent pursues. Everything else is a means for it.
Thoughts?
-Vadim