next up previous contents
Next: I. The Problem Up: Contents Previous: Series Forward   Contents

Summary of Notation

discrete time step
final time step of an episode
state at
action at
reward at , dependent, like , on and
return (cumulative discounted reward) following
-step return (Section 7.1)
-return (Section 7.2)
policy, decision-making rule
action taken in state under deterministic policy
probability of taking action in state under stochastic policy
set of all nonterminal states
set of all states, including the terminal state
set of actions possible in state
   
probability of transition from state to state under action
expected immediate reward on transition from to under action
value of state under policy (expected return)
value of state under the optimal policy
, estimates of or
value of taking action in state under policy
value of taking action in state under the optimal policy
, estimates of or
vector of parameters underlying or
vector of features representing state
   
temporal-difference error at
eligibility trace for state at
eligibility trace for a state-action pair
   
discount-rate parameter
probability of random action in $\varepsilon $-greedy policy
step-size parameters
decay-rate parameter for eligibility traces


next up previous contents
Next: I. The Problem Up: Contents Previous: Series Forward   Contents
Mark Lee 2005-01-04