next up previous
Next: About this document Up: Contents Previous: References

Summary of Notation


t discrete time step

T final time step of an episode

state at t

action at t

reward at t, dependent, like , on and

return (cumulative discounted reward) following t

n-step return (Section 7.1)

-return (Section 7.2)

policy

action taken in state s under deterministic policy

probability of taking action a in state s under stochastic policy

set of all nonterminal states

set of all states, includng the terminal state

set of actions possible in state s

probability of transition from state s to state under action a

expected immediate reward on transition under action a

value of state s under policy (expected return)

value of state s under the optimal policy

V, estimates of or

value of taking action a in state s under policy

value of taking action a in state s under the optimal policy

Q, estimates of or

temporal-difference error at t

eligibility trace for state s at t

eligibility trace for a state-action pair

discount-rate parameter

step-size parameters

decay-rate parameterfor eligibility traces



Richard Sutton
Fri May 30 21:25:06 EDT 1997