RLAI Reinforcement Learning and Artificial Intelligence (RLAI)
Function Approximation Assignments

Part 1:

    For this assigment you can think of the domain to be a 100 x 100 real value coordinate system.  You may realize that the dimensions don't matter - but having a fixed size may make it easier to imagine for now.

Take tile coder and run on the following data:

Training Data Points:
X Coordinate
Y Coordinate
Value
30.0
10.0
3.0
70.0
20.0
-1.0
8.0
50.0
5.0

Test (Query) Data Points:

X Coordinate
Y Coordinate
Value
55.8
45.2
?
37.5
99.0
?
30.5
9.5
?
30.5
15
?

    Input the training data in the order given and set all initial weights (values) to zero.  Use 16 tilings over the XY space and examine what happens when you change the width of the tiles and alpha.  To start, use a width of 10 (one tenth of the 100 x 100 space) and alpha = 0.1.

You can download the tile coding software from: http://rlai.cs.ualberta.ca/RLAI/RLtoolkit/tilecoding.html (Documentation here)

Handin:

If you are using Python, then you might this 3d graphing software (requires tk and tkinter).


Part 2:

    The goal of this assignment is to become more comfortable with TD learning techniques, eligibility traces, and function approximation by implementing a program that incorporates all three concepts. This programming assignment is much larger than those earlier in the course and will be worth more than the other programming assignments. Please plan accordingly.

    In this assignment, you will be implementing a Watkins Q(lambda) agent to be used with the RL Glue framework. Your agent will be learning about the Mountain Car environment which is described on p. 214 of the textbook and is available for RL Glue here. Pseudocode for the Watkins Q(lambda) algorithm can be found here (this corrects a number of small problems with the pseudocode in the book). As for the different parameters in the Watkins Q(lambda) algorithm, set lambda = 0.9 and do not use discounting. Use replacing traces and pick an epsilon value substantially below the rule of thumb of 0.1. Pick a suitable tile coding scheme and corresponding step size (for a brief discussion of reasonable step sizes, revisit page 205 in your textbook). You can download the tile coding software from http://rlai.cs.ualberta.ca/RLAI/RLtoolkit/tilecoding.html (Documentation here).

    Have your agent learn over the course of 100 episodes. Keep track of how many timesteps go by in each episode. Repeat this process 100 times and present the average number of timesteps taken for each of the 100 episodes. It would be nice, though not necessary, if these results could be presented as a graph.

Handin:


Hi,
Are we supposed to start the car at a random position and velocity at the beginning of each espisode? Seems the mc car environment file sets them to (-0.5,0). Should we modify the environment file to make it random? Thanks.

Cheers,
Peng  

^ Don't worry about updating the environment, just use it. Updating the environment is not the goal of the assignment and the starting postion of (-0.5, 0) should give you decent results.  

I ran into one problem with the environment in that it can create an invalid task spec; took me a little bit to notice the small bug.

The buffers 'position' and 'velocity' in the env_init() function of MountainCar.cpp are declared to be exactly 20 bytes and what goes in them takes exactly 20 characters - this means that you may not have a null byte at the end and when it creates the task_spec it can end up with extra junk in them.

Easy fix. Change

   char position[20], velocity[20];

to

   char position[21], velocity[21];

~marcel  

^ Oh, this saddens me. I apologize. We should have caught this much earlier.  

The environment uses random starts. Check the env_start function.

Cheers,
Adam  

I have fixed the mountain car environment (MC_Random: Mountain Car w. random starting position and velocity):
* Task specification string
* missing reward
* rearranged some things to make it look nicer

Sorry about this. If you have any problems with the code don't hesitate to email me (awhite@cs).

Cheers,
Adam  

Extend this Page   How to edit   Style   Subscribe   Notify   Suggest   Help   This open web page hosted at the University of Alberta.   Terms of use  2162/0