- We use the terms agent, environment, and action
instead of the engineers' terms controller, controlled system (or plant), and
control signal because they are meaningful to a wider audience.
- We restrict attention to
discrete time to keep things as simple as possible, even though many of
the ideas can be extended to the continuous-time case (e.g., see Bertsekas and
Tsitsiklis, 1996; Werbos, 1992; Doya,
15#15 to denote the immediate reward for an action taken at time step
t, instead of the more common 16#16, because it emphasizes that
the next reward and the next state are jointly determined.
- Better places for imparting
this kind of prior knowledge are the initial policy or value function, or in
influences on these. For example, see Lin (1993),
Maclin and Shavlik (1994), and
- Episodes are often called
``trials" in the literature.
- Ways to formulate tasks that are both continual and
undiscounted are subjects of current research (e.g., Mahadevan,
1993; Tadepalli and Ok, 1994). Some of the
ideas are discussed in Section 6 .7
Sat May 31 13:56:52 EDT 1997