RLAI open web page

How does segmentation fit into this framework? For example, how do we ask the question "what is the group of stones connected to x?" David Silver, Sat Oct 9 22:09:12 2004

One answer to this is to use implicit segmentation. The group of stones connected to x is described implicitly by a scope operator representing the required correlation to x. When we wish to ask a question about with the group of stones connected to x, we use a policy that restricts moves to the group about x.

Note that a correlation feature can be constructed by answering the question "will these two intersections be part of the same region at the end of the game?". As usual, we can specify different policies for the correlation (e.g. whether we are trying to maximise correlation or whether we have some other aim).

You say that this is a specialized version of the Empirical Knowledge Hypothesis. It might be even more related to the Predictive Representations Hypothesis. The web page for that has not been written yet, but it is essentially that predictive features--the answers to questions about the future--may be particularly good when it comes to generalization between states.

But I would not say all useful features have to be answers to predictive questions. That seems too strong and too limiting. An extreme case is just the locations of the stones. These features are the basis for all the others, but are not themselves predictive, just descriptive. I can also imagine that there might be other things, non-predictive functions of them, that might be useful. The total number of stones on the board, for example. Rich Sutton, Wed Oct 13 22:58:42 2004

Current observations (such as the location of stones, or the number of stones on the board) can also be described in the question framework, simply by setting all weights to zero except the current time-step. I guess the misleading part is that 'the present' is included in 'the future' as a special case. I have updated the wording of the question hypothesis to make this clearer. David Silver, Thu Oct 14 22:01:43 2004

I think we need better definitions of what constitutes a 'useful feature' and what should be accepted as 'questions and answers'.

First of all what constitutes a useful feature?
A feature is useful if it helps to improve our play? Or: A feature is useful if it can be used in an easy calculation (linear space, time in the number of features?) to compute an optimal (or just 'good enough'??) move.

Now, about questions/answers:
If I understand well, the essential components of a question are:
- A binary valued(?) function defined over a state/state-transition,
- some policies assumed for the players,
- and some nonnegative weights (why should the weights be normalized??)

I think it is critical to say something about what policies are we allowed to choose. If there is no restriction on this set, then the `question hypothesis' can be shown to hold:

Simply choose the optimal policies for both players and ask the question:
- the binary valued function takes value 1 iff player #1 wins
- the policies are the optimal policies for both players
- the weights are all one (could use discounting, but why?)

Another question is when the binary valued function is changed to a function that takes the value of 1 iff player #2 wins

Same for draw.

Now, the answers to these questions just give us everything we need to know about the positions in order to play optimally.
These are useful features!

Hence, in order to make the `question hypothesis' more interesting, we should maybe restrict what policies we are allowed to choose in the questions.

Still, in this case, if we did choose such a policy-set then I have the feeling that the `question hypothesis' will be hard to make a non-trivial hypothesis (it either fails to easily, or can be shown to hold easily because we have so much flexibility). Csaba, Wed Sep 13 21:50:21 2006

	Reinforcement Learning and Computer Go (RLGO)
	The Question Hypothesis