Next: About this document Up: Contents Previous: References
Summary of Notation



t                  discrete time step

T                  final time step of an episode

                state at t

                action at t

                reward at t, dependent, like , on  and 

               return (cumulative discounted reward) following t

         n-step return (Section 7.1)

	      -return (Section 7.2)

                policy

             action taken in state s under  deterministic policy 

           probability of taking action a in state s under stochastic policy 

                 set of all nonterminal states

               set of all states, includng the terminal state

              set of actions possible in state s

        probability of transition from state s to state  under action a

        expected immediate reward on transition  under action a

           value of state s under policy  (expected return)

             value of state s under the optimal policy

V,            estimates of  or 

         value of taking action a in state s under policy 

           value of taking action a in state s under the optimal policy

Q,            estimates of  or 

           temporal-difference error at t

           eligibility trace for state s at t

         eligibility trace for a state-action pair

             discount-rate parameter

       step-size parameters

            decay-rate parameterfor eligibility traces
Richard Sutton
Fri May 30 21:25:06 EDT 1997