RLAI open page

One could say that the problem of AI is solved, essentially solved, by RL as in the textbook. We can do learning. We can do planning. We understand the way these do, and do not, interact. We can gain knowledge about the world and use it flexibly, as in Dyna. What more could you want? You might want to do all these things more efficiently, but that maybe is a detail. Are we really done with the what and the why of what is computed? I like this question even though the answer, ultimately, is no. It reminds us of our ambition and guides us in the right future directions.

12/4/03

Before we will have completed the outline of AI, we will have to fully address at least one more major issue. Let us call it the issue of knowledge. What kind of knowledge should there be? This question is the ultimate computational theory question, the ultimate what? and why?

What is knowledge? I would like to say that it is always experience or compilations of experience, but that is too strong.   There is also policy knowledge. And one would think there is also knowledge that is just compiled computations, such as “I thought about that a long time and I couldn’t find any way to make it work”. But right now I am thinking about that most basic kind of knowledge, that about the way the world behaves, not about ourselves. This is "what would happen if" knowledge. Let us call this world knowledge.

---

The problem of world knowledge defies the conventional separation into state and transition knowledge. There is just prediction and accuracy of prediction, and ability to predict key events of interest. There is no state-wise partition of what you know.

But then what should you know? What should you predict? The null hypothesis would have to be that what we strive to predict would be ultimately determined by our genetic inheritance, and that other things would become of interest because of their relationship to that.

Thus, things can become of interest, for prediction, for any of 3 reasons:

1. Because they have been predesignated, perhaps arbitrarily, as being of interest. Rewards are like this, but not just rewards. We might view these other things as the designer’s guesses about what might, at some point in the future, be useful toward obtaining rewards.

2. Because they have been found to be causally related, or sometimes causally related, to things of interest for Reason 1 above.

3. Because it has been found that they can be learned about. This is in large part a modulation of the first two. Those are intrinsic reasons for interest. This one is about the fruitfulness of trying to learn about them. This is curiosity. This is the reward that comes just from learning.

Ok. So world knowledge is the ability to predict inputs, or more typically functions of inputs, that are of interest as just defined.

The functions of inputs might be things like their discounted cumulated sum over time (even for non-rewards). Such composite measures may be far more important to us than individual signals at particular time steps.

The individual inputs should be called observations.

This brings up an important question: Does it necessarily all come down to predicting next observations? Or is there a meaningful alternative at a higher scale? [below it is proposed that the answers to these two questions are YES and NO.]

12/6/03

I think there is a clear, identifiable, big-science kind of problem in the creation of a world model, also know as a large-ish collection of knowledge, that is completely grounded in a causal, temporal sequence of observations and actions, sensors and effectors. Where all knowledge is predictions about the future of the sequence. A grand challenge. We know in some sense that this must be possible. It is a direct challenge which should be accessible even to people without knowledge of RL. And we want the knowledge in a form suitable for planning. It should at least permit simulation of future experience, presumably at a high level.

This grand challenge could be a good basis for collaborative work.
Grounded world knowledge
    Knowledge is predictions
    Bits to bits. Data to data.
It has appeal for robocists, for psycho-philosophers like me, via the emphasis on experience and on having a life, via the call for verification, via the call for pulling the parts of AI together.

12/10/03

Let us call it the Grounded World Modeling Problem (GWMP). It has these key features:
1. You have sensors rather than state information.
2. You want the model to be suitable for planning.
3. You want it to be learnable/verifyable (because it is grounded.)
4. It can express a wide range of world knowledge.
5. All the knowledge is expressed as predictions about future experience.

---

A conceptual breakthrough (perhaps) in the predictive modeling problem: There is an outstanding question: do all predictive statements come down, ultimately, to one-step predictions? Of course there are multi-step predictions. But to evaluate them, does it always come down to the accuracy of next-step predictions?

I am thinking more and more that the answer is YES. No more complex/structured/interesting notion is needed. All the rest can be done by TD. In particular, we may be able to define K (the quantity of knowledge in a model, aka the accuracy of a model) as the expected accuracy of the sequence of one-step predictions given the equiprobable policy. Transient K can be handled as in average-reward-case RL.

12/11/03

The beginnings of a new idea: We have a proposed definition of K from yesterday. But there remains the question of the action selections. It seems inadequate to base K on the equiprobable random policy. This leaves us caring about all sorts of crazy random dances that have no point, that don’t get us anywhere.

The beginnings of a solution is to note these last few phrases. We care not so much about prediction for all possible actions as about being able to cause all possible sensations. If we have learned one set of ways of behaving which lets us control the sensations completely, perhaps produce any desired/possible sensation sequence, then we have learned all we need to know about the world. Note that there may be much more to learn. We may not know the sensations which would follow many dances, but we do not need to. We know how to absolutely control, if not predict, all the bits of interest. This is all that one could ever need in any subsequent planning problem.

12/12/03

In talking with Satinder again, we refined this one more step. The above criterion for full knowledge is too strong in that it asks for complete control whereas typically this will not be possible. Suppose we had a sufficient statistic. This means that we can predict the probability of any sequence or, equivalently, that we can predict the probability of any next observation after any possible sequence. If we can control the observations as well as one could with this, then we say we have full knowledge of the world.

	Reinforcement Learning and Artificial Intelligence (RLAI)
	Artificial intelligence is solved