Home Reinforcement Learning and Artificial Intelligence (RLAI)
Exercise 2.55


Exercise 2.55  (28 pts.)  Plotting recency-weighted averages

Equation 2.5 is a key update rule we will use throughout the course.  This exercise will give you a better hands-on feel for how it works.  This exercise has ten parts.  Please label them clearly (1)-(10).  There are two points for plotting each graph and one point for answering each question.  Get a piece of graph paper (or print this pdf file) and prepare to plot by hand.  Do not use a computer or a calculator; it is enough to figure out the values approximately and qualitatively, and then plot them by hand.

(1) (5 pts.) Make a vertical axis one inch high that runs from 0 to 1 and a horizontal axis from 1 to 18.  Suppose the estimate starts at 0 and the step-size (in the equation) is 0.5.  Suppose the target is 1.0 for 6 steps.  Plot both the target and the estimate points and connect the points of each type by lines.  Thus, for time step 1, the estimate will be zero and the target will be 1. (The estimate should be zero on the first time step, and there should be no time step 0.) To check that you have the timing right, look at any time step; the vertical distance between the target and the estimate on that time step should correspond to the error used to produce the estimate on the next time step.
How close is the estimate to 1.0 after 6 steps?  Without plotting or using a calculator, how close would it be after 10 steps (assuming the target continues to be 1) (symbolic answer preferred)?  After 20?  Suppose instead, after the 6 steps, the target changes to 0.  Plot the target and the estimate for 6 more steps.  Now suppose the target alternates between 0 and 1 (starting with 1) for the next 6 steps.  Plot the target and then the trajectory of the estimate over this time (qualitatively).  This case is similar to that of a noisy target.

(2) (2 pts.) Start over with a new graph and the estimate again at zero, this time with a step size of 1/8.  Repeat the target trajectory as above. 

(3) (3 pts.) Make a third graph with a step size of 1.0 and repeat.  Which step size produces estimates of smaller error when the target is alternating?

(4) (4 pts.) Make a fourth graph with a step size of 1/t (i.e., the first step size is 1, the second is 1/2, the third is 1/3, etc.)  Repeat the target trajectory.  Based on these graphs, why is the 1/t step size appealing?  Why is it not always the right choice?

(5)-(8) (9 pts.) You might think that the step size should be between 0 and 1.  Make plots for the first 6 steps with step sizes of -1/2, 1.5, 2.0, and 2.5 (adjust the range of the vertical axis appropriately).  What is the safe range for the step size?

(9)-(10) (5 pts.) Finally, suppose the step size is 1, and the target is defined as 1 plus half the current estimate.  Plot the estimate for 6 steps.  Repeat with a step size of 1.5.  Which of these two cases approaches the asymptotic value faster?



Disclaimer: I'm neither a professor nor a TA, so take my advice at your own risk. :)

StepSize is just a scale on how much we adjust our old estimate toward (or away from) matching the target.  A useful way to think about it is "how big of a step should we take toward the target's value?"  As for comparing the graphs, the common scale they share is the *number* of steps taken (18 steps for graphs 1-4, and 6 for graphs 5-10).  So, I suggest plotting the graphs as (Target or Estimate) vs. Step Number.