 |
Reinforcement Learning and Computer
Go (RLGO)
|
Uncertainty
|
The ambition of this web
page is to define and discuss the meaning of uncertainty in the game of Go.
In Go many things are uncertain.
But what do we mean by this?
Normally we mean that we are making a prediction, because we cannot
evaluate something perfectly. This could be due to:
- Opponent policy is unknown
- Opponent policy may be stochastic
- Our own policy may be stochastic
- It may not be computationally feasible to evaluate, even with
deterministic policies.
For now we assume that the opponent is playing according to some fixed,
stochastic policy πT and that we are playing
according to some fixed, stochastic policy πU (them and us policies). The opponent policy
need not be known.
To represent a prediction we must ask a question
that we wish to answer. To evaluate the prediction, we play out a game
following the policies specified by the question, and see what answer
actually results. To improve the
estimate we average over many evaluations (in other words playing out
the position many times).
[FOOTNOTE: in real games, the opponent's policy may also vary according
to their model of our own policy. However, we could incorporate this
into the above framework by including opponent model in the state. The
policy then remains fixed with respect to this new state.]
Probability and expectation
We often use terms such as probability
and expectation when
discussing Go. But what does this really mean?
When we refer to the probability of an event occurring, we are
normally describing the question of whether a binary observation will
become 1 at any point in the future. But once we view the probability
as a question, it is clear that we must specify more information: who
is to play, and what policies will be followed by us and them, and what
timescale we are interested in. Without these specifications the term probability is ill-defined!
When we refer to the expectation of a value, we are normally
describing the expected outcome
of that value. Again, we can view this as answering a question - after
all this is the definition of a question! However, we must again
specify more information: who is to play, and what policies will be
followed by us and them, and what timescale we are interested in.
Without these specifications the term expectation
is ill-defined!
In general, the question framework forces us to be more precise with
our descriptions of uncertainty than when we use the vague but familiar
terms probability and expectation.