Function Approximation Assignments

	Reinforcement Learning and Artificial Intelligence (RLAI)
	Function Approximation Assignments

Part 1:

For this assigment you can think of the domain to be a 100 x 100 real value coordinate system. You may realize that the dimensions don't matter - but having a fixed size may make it easier to imagine for now.

Take tile coder and run on the following data:

Training Data Points:

X Coordinate	Y Coordinate	Value
30.0	10.0	3.0
70.0	20.0	-1.0
8.0	50.0	5.0

Test (Query) Data Points:

X Coordinate	Y Coordinate	Value
55.8	45.2	?
37.5	99.0	?
30.5	9.5	?
30.5	15	?

Input the training data in the order given and set all initial weights (values) to zero. Use 16 tilings over the XY space and examine what happens when you change the width of the tiles and alpha. To start, use a width of 10 (one tenth of the 100 x 100 space) and alpha = 0.1.

You can download the tile coding software from: http://rlai.cs.ualberta.ca/RLAI/RLtoolkit/tilecoding.html (Documentation here)

Handin:

RELEVANT portions of your code. Do not hand in the RL code that you downloaded, PLEASE.
Your results for various widths and alphas.
Talk about your results a bit. It would be nice (but not necessary) for you to include some sort of 3d view (with Excel or similar) of what the function looks like after training. Basically, do the assignment, see what you find interesting, and briefly present it.

If you are using Python, then you might this 3d graphing software (requires tk and tkinter).

Part 2:

    The goal of this assignment is to become more comfortable with TD learning techniques, eligibility traces, and function approximation by implementing a program that incorporates all three concepts. This programming assignment is much larger than those earlier in the course and will be worth more than the other programming assignments. Please plan accordingly.

    In this assignment, you will be implementing a Watkins Q(lambda) agent to be used with the RL Glue framework. Your agent will be learning about the Mountain Car environment which is described on p. 214 of the textbook and is available for RL Glue here. Pseudocode for the Watkins Q(lambda) algorithm can be found here (this corrects a number of small problems with the pseudocode in the book). As for the different parameters in the Watkins Q(lambda) algorithm, set lambda = 0.9 and do not use discounting. Use replacing traces and pick an epsilon value substantially below the rule of thumb of 0.1. Pick a suitable tile coding scheme and corresponding step size (for a brief discussion of reasonable step sizes, revisit page 205 in your textbook). You can download the tile coding software from http://rlai.cs.ualberta.ca/RLAI/RLtoolkit/tilecoding.html (Documentation here).

    Have your agent learn over the course of 100 episodes. Keep track of how many timesteps go by in each episode. Repeat this process 100 times and present the average number of timesteps taken for each of the 100 episodes. It would be nice, though not necessary, if these results could be presented as a graph.

Handin:

The code for your Watkins Q(lambda) agent. Do not hand in the RL code that you downloaded, PLEASE.
Your results for the number of timesteps taken in each of the 100 episodes averaged over the 100 runs.
An explanation (justification), possibly in the comments of your code, as to why you chose the tile coding scheme and step size that you chose.

Hi,
Are we supposed to start the car at a random position and velocity at the beginning of each espisode? Seems the mc car environment file sets them to (-0.5,0). Should we modify the environment file to make it random? Thanks.

Cheers,
Peng Peng Wang, Fri Mar 24 01:59:30 2006

^ Don't worry about updating the environment, just use it. Updating the environment is not the goal of the assignment and the starting postion of (-0.5, 0) should give you decent results. Brian, Fri Mar 24 07:14:47 2006

I ran into one problem with the environment in that it can create an invalid task spec; took me a little bit to notice the small bug.

The buffers 'position' and 'velocity' in the env_init() function of MountainCar.cpp are declared to be exactly 20 bytes and what goes in them takes exactly 20 characters - this means that you may not have a null byte at the end and when it creates the task_spec it can end up with extra junk in them.

Easy fix. Change

char position[20], velocity[20];

to

char position[21], velocity[21];

~marcel Marcel, Mon Mar 27 22:17:41 2006

^ Oh, this saddens me. I apologize. We should have caught this much earlier. Brian, Tue Mar 28 09:02:55 2006

The environment uses random starts. Check the env_start function.

Cheers,
Adam Anonymous, Sat Apr 7 00:37:53 2007

I have fixed the mountain car environment (MC_Random: Mountain Car w. random starting position and velocity):
* Task specification string
* missing reward
* rearranged some things to make it look nicer

Sorry about this. If you have any problems with the code don't hesitate to email me (awhite@cs).

Cheers,
Adam Anonymous, Sat Apr 7 01:10:33 2007

Extend this Page How to edit Style Subscribe Notify Suggest Help This open web page hosted at the University of Alberta. Terms of use 2162/0