![]() |
discrete time step |
![]() |
final time step of an episode |
![]() |
state at ![]() |
![]() |
action at ![]() |
![]() |
reward at ![]() ![]() ![]() ![]() |
![]() |
return (cumulative discounted reward) following ![]() |
![]() |
![]() |
![]() |
![]() |
![]() |
policy, decision-making rule |
![]() |
action taken in state ![]() ![]() |
![]() |
probability of taking action ![]() ![]() ![]() |
![]() |
set of all nonterminal states |
![]() |
set of all states, including the terminal state |
![]() |
set of actions possible in state ![]() |
![]() |
probability of transition from state ![]() ![]() ![]() |
![]() |
expected immediate reward on transition from ![]() ![]() ![]() |
![]() |
value of state ![]() ![]() |
![]() |
value of state ![]() |
![]() ![]() |
estimates of ![]() ![]() |
![]() |
value of taking action ![]() ![]() ![]() |
![]() |
value of taking action ![]() ![]() |
![]() ![]() |
estimates of ![]() ![]() |
![]() |
vector of parameters underlying ![]() ![]() |
![]() |
vector of features representing state ![]() |
![]() |
temporal-difference error at ![]() |
![]() |
eligibility trace for state ![]() ![]() |
![]() |
eligibility trace for a state-action pair |
![]() |
discount-rate parameter |
![]() |
probability of random action in ![]() |
![]() |
step-size parameters |
![]() |
decay-rate parameter for eligibility traces |