|
Reinforcement Learning and
Artificial
Intelligence (RLAI)
|
Function
Approximation Assignments
|
Part 1:
For this assigment you can think of the domain to be
a 100 x 100 real
value coordinate system. You may realize that the dimensions
don't matter - but having a fixed size may make it easier to imagine
for now.
Take tile coder and run on the following data:
Training Data Points:
X Coordinate
|
Y
Coordinate
|
Value
|
30.0
|
10.0
|
3.0
|
70.0
|
20.0
|
-1.0
|
8.0
|
50.0
|
5.0
|
Test (Query) Data Points:
X
Coordinate
|
Y
Coordinate
|
Value
|
55.8
|
45.2
|
?
|
37.5
|
99.0
|
?
|
30.5
|
9.5
|
?
|
30.5
|
15
|
?
|
Input the training data in the order given and set
all initial weights
(values)
to zero. Use 16 tilings over the XY space and examine what
happens when you
change the width of the tiles and alpha. To start, use a width of
10 (one tenth of the 100 x 100 space) and alpha = 0.1.
You can download the tile coding software from: http://rlai.cs.ualberta.ca/RLAI/RLtoolkit/tilecoding.html
(Documentation here)
Handin:
- RELEVANT portions of your code. Do not hand in the RL code
that
you downloaded, PLEASE.
- Your results for various widths and alphas.
- Talk about your
results a bit. It would be nice (but not necessary) for you to
include some sort of 3d view (with Excel or similar) of what the
function looks like after training. Basically, do the assignment,
see what you find interesting, and briefly present it.
If you are using Python, then you might this 3d graphing software
(requires tk and tkinter).
Part 2:
The goal of this assignment is to become more
comfortable with TD learning techniques, eligibility traces, and
function approximation by implementing a program that incorporates all
three concepts. This programming assignment is much larger than those
earlier in the course and will be worth more than the other programming
assignments. Please plan accordingly.
In this assignment, you will be implementing a
Watkins Q(lambda) agent to be used with the RL Glue framework.
Your
agent will be learning about the Mountain Car environment which is
described on p. 214 of the textbook and is available for RL Glue here.
Pseudocode for the Watkins Q(lambda) algorithm can be
found here (this
corrects a number of small problems with the pseudocode in the book). As
for the different parameters in the
Watkins Q(lambda) algorithm, set lambda = 0.9 and do not use
discounting. Use replacing traces and pick an epsilon value
substantially below the rule of
thumb of 0.1. Pick a suitable tile coding scheme and corresponding step
size (for a brief discussion of reasonable step sizes, revisit page 205
in your textbook). You can download the tile coding software from http://rlai.cs.ualberta.ca/RLAI/RLtoolkit/tilecoding.html
(Documentation here).
Have your agent learn over the course of 100
episodes. Keep track of how many timesteps go by in each episode.
Repeat this process 100 times and present the average number of
timesteps taken for each of the 100 episodes. It would be nice, though
not necessary, if these results could be presented as a graph.
Handin:
- The code for your Watkins Q(lambda) agent. Do not hand in
the RL code that
you downloaded, PLEASE.
- Your results for the number of timesteps taken in each of the 100
episodes averaged over the 100 runs.
- An explanation (justification), possibly in the comments of your
code, as to why you chose the tile coding scheme and step size that you
chose.
Are we supposed to start the car at a random position and velocity at the beginning of each espisode? Seems the mc car environment file sets them to (-0.5,0). Should we modify the environment file to make it random? Thanks.
Cheers,
Peng