RLAI Reinforcement Learning and Artificial Intelligence (RLAI)
Intrinsic Motivation
Edited by Mark Ring Created by Anna, October 22, 2004

The ambition of this page is to keep track of some of the things going on in the Intrinsic Motivation meeting at the University of Alberta.  Research continues, but this meeting is currently dormant.

There is an intrinsic motivation mailing list hosted at Rutgers University.  To get on the list, send email to mlittman@cs.rutgers.edu and give a good reason for being on it, such as that you are actively working on these ideas on an aibo or computational world.

We have the playroom code now, so contact anna@cs if you want it.

2004/12/6 -

The plan from last week for this meeting was to do the survey paper on exploration  by sebastian thrun.  but now rich is going to present K1 (successor to IDBD) in MLRG on the same day.  So i propose we do just the K1 paper for monday, at MLRG, making that the intrinsic motivation meeting for the day.  The thrun paper would be postponed to next week.  Rich, Dec 3, 2004

2004/11/29 -

Rich would like to present a few slides on K1, the successor to IDBD. A paper on this topic is available here.  We have to do an experiment and get some results. Gather your thoughts and submit ideas.
We would probably have Satinder in our meeting.

2004/11/22 -

    David presented a new method for exploration. Slides are available here.
   David will work on this new idea and test them in an empirical environment.

2004/11/15 -
There was a brainstorming session about Intrinsic Motivation. The main subjects considered on the session were:
=>

2004/11/08 -

Satinder will video-conference in and show a video of aibo learning at the option level.  We will also go through Michael Littman's slides from the DARPA site visit at Rutgers.  We should also figure out what to tell him to characterize what we are up to.

2004/11/01 -

Mark joined us by iChat and we discussed the layout for a large grid world, which we intend to use to explore exploring. Click here to see a cleaned up version of what we had on the board.

2004/10/25 -

Rich suggested we get started working on intrinsic motivation for exploration, balanced against other goals, in the big gridworld environment.  Brian suggested we use the playroom world from the paper by by Satinder, Andy and Nuttapong which we discussed the previous week.  We discussed the playroom paper again, trying to understand exactly how the playroom worked and what the paper demonstrated. We would like to test how preferring options compares to intrinsic reward, and test the IDBD idea.

We had a specific proposal -- could you get similar results to the playroom paper if you left out all the curiosity stuff and used a simpler exploration encouraging method, such as optimistic initialization of the option values?  It is possible that most of the effect comes simply from providing just the right options and then preferring to take options over actions whenever possible.  Perhaps we should get the playroom/curiosity code and compare with the playroom paper's results. 

2004/10/18 -

We discussed the playroom paper paper by Satinder, Andy and Nuttapong. We would like to get the playroom code and try our own stuff on it, particularly using the step-size parameter of IDBD as the reward for salient states, rather than the error. Using the error does not guard against being rewarded for being utterly confused, whereas IDBD's alpha will decrease if learning is not improving.


11/15/04

Today we had a two-hour meeting on intrinsic motivation with the primary question appearing to be "Is there more to intrinsic motivation than merely the desire to maximize long-term reward?" A number of ideas for motivating the agent to explore, to learn to predict, to learn to control its sensation, etc., etc. have been brain-stormed. Rich made a list of behaviors/tasks such as "learn to predict and control salient events", "learn about surprising or novel events", etc.

I would like to discuss (i) whether the question "Is there really intrinsic reward/motivation and if so then what is it?" makes sense and (ii) whether it actually matters at all. I think the fundamental question which we can never avoid is "What is the grand objective of an AI agent?" This seems to be a very important and foundational question in AI. RL answers this question as "The objective is to maximize the long-term scalar reward signal supplied by the environment". Thus, if the reward signal is provided then it is pretty cut and dry and it is exactly maximizing it that measures everything up. For instance: Does learning feel good? Well, only as long as it helps maximizing returns. Does gaining control over agent sensations feels good? Well, only if it helps maximizing the returns, and so on. Maximizing long-term reward and only it makes anything useful.

Thus, it seems to me that in the RL framework, the questions of existence of the intrinsic reward and its definition are really in the eye of the beholder. It appears similar to Rich's answer to "What is the goal of an agent?" --- "It is whatever helps us (the observers) to predict the agent's behavior". We, the observers, attribute a goal to an agent. By the same token, I would like to propose that intrinsic motivation is something that we, the observers, can attribute to the agent if it helps us explain/predict its behavior. Thus, the question of whether the agent is intrinsically motivated is somewhat meaningless.

Summary:
1. if we are observing an agent then _we_ attribute intrinsic motivation to it to help us explain/predict its behavior;
2. if we are building an RL agent and already have a reward function for it then introducing intrinsic motivation is similar to giving it shaping rewards, providing an initial Q function, creating a state abstraction, engineering options, etc. All of these are _means_ to make maximizing the "real" long-term reward faster and more efficiently;
3. so, perhaps, instead of asking whether intrinsic motivation "really" exists and what it is, we should ask what we can add to the agent's learning mechanism to make the agent maximize its reward faster and more efficiently. That is the only goal an RL agent pursues. Everything else is a means for it.

Thoughts?

-Vadim 

I like Vadim's summary above.  I think Point 2 is the most important---intrinsic motivation is a kind of shaping reward that we need to put into our RL systems so that they can learn in extremely difficult domains.  I think Point 1 is valid, although I believe it is secondary.  In particular, saying that "intrinsic motivation is in the eye of the beholder" is akin to saying "gravity is in the eye of the beholder".  In both cases, it is a theoretical construct that we use to explain our observations of the natural world.  But, heck, at some point an explanation is sufficiently compelling that it useful to say that it is in the world.  I'm not saying that intrinsic motivation is necessarily in this category, but I think one could make a strong case that it is.

I've been thinking a bit about how we could formulate an experiment to actually quantify some intrinsic rewards.  For example, there are behavioral experiments that show people giving up money for the opportunity to punish someone who isn't being fair.  I think revenge is an intrinsic reward---sometimes people get an urge to get back at someone even when it goes against other more "rational" rewards.  The other intrinsic motivations mentioned above probably have the same properties---people would be willing to satisfy these motivations, even if it costs money (up to a point).  Putting the various motivations on a scale (money is one, but there could be others), could be extremely valuable in getting a handle on what they are and how they work.

-Michael (11/16/04)


I would like to discuss a particular point Michael made above. Namely, when Michael says:

"I think revenge is an intrinsic reward---sometimes people get an urge to get back at someone even when it goes against other more 'rational' rewards."

he seems to imply that 'rational rewards' are the _extrinsic_ / _real_ rewards in human life. This brings us to one of the most important questions in the 'RL for Human Intelligence' story:

What are the real/external RL rewards in human life?

It appears important to define these for otherwise, we have no basis to talk about intrinsic (i.e., internal) rewards. But could it be that pleasure from revenge is just as external/real to us as pleasure from quenching hunger or thirst?

More generally:

Is there a reward signal whose discounted sum humans try to maximize?

But then, perhaps, this existential question is really meaningless because rewards are in the eye of the beholder. Namely, if it helps us _predict_ what humans do by _ascribing_ them a reward signal (whether extrinsic or intrinsic) and then _conjecturing_ that humans are RL agents, then so be it.

In such a case, it is purely up to us, the observers, to decide which parts of this _ascribed_ reinforcement signal we call extrinsic and which parts we call intrinsic. But if it is only a terminological question then why is it important and why cannot we just resolve it arbitrarily?

Vadim (11/22/04) 

Vadim: By "rational" rewards, I suppose I meant "rewards correlated with survival/success".  It makes sense to say "all rewards are intrinsic" in that o rewards come directly from the outside world.  Maybe the issue we're getting at is whether or not there are rewards that exist primarily to support learning/development.  That is, whereas pain appears to remain a motivator throughout our lives, some people appear to become less motivated by the drive to learn.

So, maybe the distinction is between stationary rewards ("extrinsic") and developmentally non-stationary rewards ("intrinsic").

-Michael (11/25/04)

Vadim, i would be interested in your position on the reward hypothesis.

Actually, i think we should all consider that the reward hypothesis in the light of intrinsic motivation. Where one goes from here seems to depend strongly on whether or not one accepts that hypothesis.

-Rich 

Michael:

I like your example of people getting less motivated to learn as they become older. However, do we really need to introduce additional rewards to explain this phenomenon? How about the following intrinsic-reward-free explanation:

People are less eager to learn as they become older simply because of the diminishing returns effect: more learning results in smaller gains in the cumulative "real" reward (i.e., related to their "success/survival"). In fact, even negative gains are possible since exploring already well-known areas of the state-action space is inferior to exploiting them.

This mechanism is similar to decreasing temperature in simulated annealing search and does not require an introduction of additional (intrinsic) rewards.

Vadim (11/25/04). 

Extend this Page   How to edit   Style   Subscribe   Notify   Suggest   Help   This open web page hosted at the University of Alberta.   Terms of use  4321/11