![]() |
Reinforcement Learning and Computer
Go (RLGO) |
Value
function |
Most existing Computer Go programs attempt to estimate a score value
function (if they estimate a value function at all). According to the win/lose value function hypothesis this
cannot lead to a strong Computer Go program. Any program that chooses
its moves according to the expected score cannot play strong Go, since
it has no concept of the risk
associated with a move or position.
Some Computer Go programs use an unspecified heuristic function to rank
and select moves. According to the hypothesis, any program that chooses
its moves according to a heuristic
function can only play strong Go if the heuristic function
approximates a value function (and in particular the win/lose value
function). This is because an unspecified
heuristic function cannot make any judgement about the future value of
a position, and so any judgements it makes will be unverifiable later
on in the game, preventing us from learning and improving the heuristic.
Note that the win/lose value
function
hypothesis is a specialised form of the value function hypothesis.