Reinforcement Learning and Artificial Intelligence (RLAI)
	Tiles Reference Manual (development version)

The ambition of this web page is to provide information on the usage of the tile coding software of the RL toolkit.

Tile Coding Software, Version 2.0

by Richard S. Sutton

Introduction
tiles (the main routine)
Example: 32 4x4 Rectangular Grid-Tilings
Shortcuts for 1 or 2 Floating-Point Variables
Simple (Single-Call) Variations
Complex (Multiple-Call) Variations
Getting Consistent Hashing
Lisp Versions
Python Versions
Python calling C Versions
Safe Versions
Wrap-around Versions

Introduction

Here we describe code in C++ and LISP implementing the core part of the idea of tile coding with hashing. Tile coding is a way of representing the values of a vector of continuous variables as a large binary vector with few 1s and many 0s. The binary vector is not represented explicitly of course, but as a list of the components that are 1s. The main step is to partition, or tile, the continuous space multiple times and select one tile from each tiling, that corresponding the the vector's value. Each tile is converted (hashed down to) an element in the big binary vector, and the list of the tile (element) numbers is returned as the representation of the vector's value. Information and examples of tile coding and its use in function approximation and reinforcement learning are given in Section 8.3.2 of the RL textbook.

Tile coding is the key distinguishing computational idea behind CMACs. Implementations of complete CMACs are available from the folks at the University New Hampshire here. Their package is more suited to straight, bundled CMAC use. Here we separate tile coding from full CMACs, which allows it to be more flexibly used, e.g., with eligibility traces. The current software is also simplified and streamlined consistent with its use in reinforcement learning applications. For example, our code is only a few pages in length and there is just one routine to call. In its normal use that routine has no memory and requires no special setup or data types.

This version 2.0 of the tiling software provides an alternate version of the main routine that is "safe" in that it corrects for all hashing collisions, so as to exactly simulate infinite tilings (until it runs out of memory). This version can also be used to gather statistics on hashing collisions even if they are be ignored (as in the standard version). The alternate version requires that an additional hashing table be passed in, and there is greater computational expense.

Next we describe the use of the tile coding routine. This description applies to both the C and LISP implementations except that some of the data handling is a little simpler in LISP. Mostly we assume we are talking about C; the simplifications for LISP users are summarized here. The C++ code assumes the existence of a function "rand()" that produces successive psuedorandom integers, of which only the low-order bytes are used. Recently we have added a Python version which has not yet been extensively tested.

tiles

The tile coding software provides a single routine, "tiles":

void tiles(
 int tiles[], // provided array will contain the returned tile indices
 int num_tilings, // number of tile indices to be returned
 int memory_size, // total number of possible tiles
 float floats[], // array of floating-point vars to be converted to tiles
 int num_floats, // number of floating-point variables
 int ints[], // array of integer variables to be converted to tiles
 int num_ints) // number of integer variables

The overall idea here is as follows. The first argument (tiles) is the output of the routine. This array is provided by the caller and filled up by tiles with tile indices (integers in [0,memory_size)). The second argument is the number of tile indices returned, so the provided array had better be of at least this length. It is recommended by the folks at UNH (who designed the key algorithm here) that num_tilings be a power of two.

Next, memory_size specifies the largest permitted tile index. The next two arguments are the primary inputs to tiles. They specify the floating-point variables which are to be tiled, in an array, with an accompanying integer argument to indicate the length of the array. The routine will overlay grid-tilings on these variables, generalizing across values that differ by less than 1.0.

The last two arguments are optional. For each different value of the num_ints integers in ints, the tiles will be hashed totally differently, such that there is no generalization to any other value of the integers. These inputs are often used to ensure different hashings for different groups of tilings, e.g., for different actions in reinforcement learning. There are some illustrative examples below. Alternatively, if there are three such integers or less, they can be provided as simple integer arguments to tiles (in place of the last two arguments as described above). These cases will be intercepted by a specialized version of tiles, the integers packed into an array automatically, and finally the base version of tiles, as described above, will be called. There is a similar shortcut for the case with one or two floating-point variables, described below.

Next we describe examples, gradually increasing in complexity, of the use of the tiles routine.

Example: 32 4x4 Rectangular Grid-Tilings

In the simplest case, we form simple grid-like tilings. Suppose we are interested in the unit square over two real variables x and y. We wish to tile it into 4x4 tilings (i.e., each tile is .25 by .25). This will create the possibility of generalization between any two input vectors in which both components are within .25 of each other. tiles assumes unit generalization in its inputs, so we arrange for .25 generalization by scaling x and y and stuffing the results into a float array float_array to be input to tiles:

float float_array[2];

float_array[0] = x / 0.25;
float_array[1] = y / 0.25;

Despite this fairly broad generalization, we would like the ability to make fine distinctions if required. For this we need many tilings, say 32 (a power of two is good here). This will give us an ultimate resolution of .25/32=.0078, or better than 1%. To arrange for this we pass the tiles routine an integer array, tiles_array, of at least 32 elements, to hold the returned results, and pass 32 as the num_tilings argument.

int tiles_array[32];

Finally, we have to select memory_size, the range of tile indices that will be hashed down into. First we consider the "natural" memory size, as if we were doing no hashing. This is simply the total number of tiles in all the tilings, or 4*4*32=512 in our case. This is not a large number and might be used directly as the memory_size. Note, however, that this does not mean there will be no hashing or hash collisions. This routine always hashes and so collisions are possible. You could make them very unlikely by using a large memory size, say 5120, or even 512000 if you have enough memory for how the tiles are to be used. More often the natural memory size is too large, say 10^10, and we must hash into a smaller memory size.

Thus, our final call to tiles routine might be

tiles(tiles_array,32,512,float_array,2);

The returned tiles can be used in many ways, as is discussed elsewhere. But let us complete this simple example by showing one use. To do simple linear function approximation one uses an array of weights of number equal to the total number of possible tiles (memory_size). The approximate function value for the input vector of real values is obtained simply by summing the weights of the corresponding returned tiles:

float weights[512];

float result = 0.0;

for (i=0;i<32;i++) \

    result += weights[tiles_array[i]];

Finally, for learning, assume we have a value target that we wish to adjust the approximation result towards:



float target;

float alpha = 0.1 / 32; // step-size parameter

for (i=0;i<32;i++) \\

    weight[tiles_array[i]] += alpha * (target - result);

Shortcuts for 1 or 2 Floating-Point Variables

For simple uses of tiles, packing into arrays is tiresome, so we provide routines to do the packing for you for the cases with 1 or 2 floating-point state variables, called tiles1 and tiles2. For example, the 4x4 grid case described above could be done without explicitly packing into the float_array by calling

tiles2(tiles_array,32,512,x/0.25,y/0.25);

Simple (Single-Call) Variations

There are a number of useful simple variations on the example described above. Obviously, one could scale the two dimensions differently to get rectangular rather than square tiles. One could also stretch and distort the dimensions, for example, making one logarithmic. Or one could obtain a rotated grid by linear transformations on the original x and y variables. For example, to get a grid oriented along the diagonals, we construct the float_array like this:

float_array[0] = (x+y) / 0.25;
float_array[1] = (x-y) / 0.25;

Note that the tiling produced in all these cases is not restricted to any region of the space. It extends in all directions to infinity; it is a tiling of the infinite plane. These infinite number of tiles are reduced to memory_size in number by the hashing.

If you want to restrict the input vector to a limited region, then you must do so yourself before calling tiles. If you want to achieve a wrap-around effect, then you should cause wrap-around when you construct the float_array, e.g., by dividing by the width of the region and filling float_array with the remainder.

Complex (Multiple-Call) Variations

Sometimes one can get by with a single kind of tiling, as produced by a single call to tiles (note that a single call creates multiple tillings, but these are all simple offset copies of each other). But for many other cases one needs to be a little more sophisticated. A great deal can be done to control the generalization and to limit the memory requirements by using multiple tilings through multiple calls to tiles. We consider some of these below.

Suppose as previously you have input vectors in the unit square, but you want to generalize between input vectors that are similar along the x dimension OR similar along the y dimension. To do this you need two kinds of tilings, one producing stripes horizontally and one vertically. Suppose you want to make 32 tilings of each type, and combine the results in one 64-element tiles_array:

int tiles_array[64];

tiles1(tiles_array,32,4*32*2,x/0.25,0);
tiles1(&tiles_array[32],32,4*32*2,y/0.25,1);

The second call to tiles passes in, in effect, the second half of the full tiles_array. The tiles from the second group of tilings will go there. Notice that the memory_size input is the same for both calls and set roughly equal to the memory requirements for both sets of tiles: 4 (stripes over the unit interval) * 32 (number of tilings within a stripe) * 2 (because there are two sets of tilings over this memory space). As always, the memory_size can be chosen within wide limits, but it makes sense to chose it the same for all the groups of tilings. Finally, notice that we change the final integer input from 0 to 1 in the second call. Whenever using multiple calls like this, it is a good idea to give them each a different setting for the integer arguments.

There are a lot of possibilities here. One could use two groups of tilings, one for each single dimension, as illustrated above, PLUS an additional group of tilings that depends on both dimensions. This would create a system that generalizes partly along each dimension separately, but also on the two dimensions conjunctively, so that it could if needed learn to give a special response to a particular pair of x,y values. Another possibility would be to choose some groups of tilings with coarse generalization and some with fine.

One can control the tendency to generalize in the different ways given by each group of tilings by controlling the number of tilings in each group. For example, if you wanted discrimination to be stronger along x rather than y, then you would ask for more x tillings (more than 32) in the above example, or fewer y tilings.

Getting Consistent Hashing

In many applications one wants to make sure hashing, a random process, is done in a consistent way, e.g., across different runs of a program. The hashing done by the routines described here is determined by some tables filled with random numbers the first time the routine is called. To get consistent hashing you need only ensure that the random number generator is in a consistent state for the first call. A convenient way to do this is to make a dummy first call during an initialization portion of your program. For example, an early portion of your C++ code could include:

int dummy_tiles[1];
float dummy_vars[1];

srand(0);
tiles(dummy_tiles,1,1,dummy_vars,0); // initializes tiling code

In lisp, you would use an implementation-dependent procedure to set *random-state* in a reproducable manner, then call (tiles nil 1 1).

In Python, you can seed the random number generator with random.seed(number), and then import the tiles module.

Lisp Versions

The LISP function "tiles" returns a list of tile indices. It is called as follows:

(tiles 
 num-tilings ; the number of tilings desired
 memory-size ; the number of possible tile indices
 floats ; a list of real values making up the input vector
 ints) ; optional list of integers to get different hashings

All the inputs have the same meanings and uses as described above. Multiple calls are combined by appending (or nconcing) the resultant lists into one list.

The function "load-tiles" is also available as a maximally efficient interface. It loads a portion of a provided array with the returned tile indices. It is called as follows:

(load-tiles
 tiles ; a provided array for the tile indices to go into
 starting-element ; first element of "tiles" to be changed (typically 0)
 num-tilings ; the number of tilings desired
 memory-size ; the number of possible tile indices
 floats ; a list of real values making up the input vector
 ints) ; list of optional inputs to get different hashings

Python Versions

The Python function "tiles" returns a tuple of tile indices. It is called as follows:

tiles(numtilings, memsize, floats, ints)

All the inputs have the same meanings and uses as described above. Multiple calls are combined by concatenating the resultant tuples into one.

The function "loadtiles" is also available as a maximally efficient interface. It loads a portion of a provided array with the returned tile indices. It is called as follows:

loadtiles(tiles, startelement, numtilings, memsize, floats, ints)

with meanings as in the lisp calls.

Python Calling C Versions

In order to improve the efficiency of the tiling functions, but still have the flexibility of Python, there is also a version available for Python which calls the C version of tiles. The calling sequences are exactly the same as for the Python versions, although the module is loaded from a different location (e.g. CTiles.tiles instead of RLtoolkit.tiles).

Safe Versions

With the code described so far, there is a small probability that unrelated inputs will hash into some of the same tiles. In a group of tilings, usually there will be no more than one such "collision", so that it is not a big problem; the learning process will sort it out. There will not be a big effect on performance unless the memory is too small or the hash functions are poorly designed. Nevertheless, the possibility of such a problem is annoying. When one's program doesn't work, there is a tendency, deserved or not, to suspect a failure of the hashing function.

To alleviate this kind of concern, we provide another "safe" version of the routine that does the additional work to prevent hashing collisions -- or rather to detect them and take the necessary corrective steps. The result is a guaranteed exact emulation of the infinite tilings. Every different tile used will have a different index. If more tiles are used than there is space within the memory_size, then an error is signaled. Of course there is also a cost in terms of additional memory and computation.

The additional memory is contained in an additional data structure, called a "collision table". To create and use a collision table in the original 4x4 example:

collision_table ct(512,1);
float float_array[2];
int tiles_array[32];

float_array[0] = x / 0.25;
float_array[1] = y / 0.25;
tiles(tiles_array,32,&ct,float_array,2);

In the declaration of the collision table, the 512 is the size of the table. The size must be a power of two. The 1 in the declaration indicates that this is to be a "safe" table; provide a 0 here if you just want to collect statistics on a regular table. Notice that the collision table is then used in place of the memory_size in calls to tiles. This is possible because of course the collision table has the memory_size stored within it.

You can call ct.usage() to determine how many entries in the table are currently in use. You can also examine ct.calls, ct.clearhits, and ct.collisions to determine the number of times this collision table has been used (calls), has returned a result without a collision (clearhits), and has had collisions (collisions, which may be more than one per call).

ct.reset clears the collision table, making room for a new set of entries.

In Lisp, you construct a collision-table by calling make-ct:

(make-ct

    memory-size ; the number of possible tile
indices

    safety-level) ; one of :safe, :unsafe, or
:super-safe

which returns the newly initialized collision-table. The memory-size here must be a power of two, e.g., 1024; otherwise it is the same memory-size as before. If the safety-level is :unsafe, then the collision table is used only to collect statistics; collisions are still allowed and not corrected for. Normally, a safety-level of :safe is adequate, but :super-safe is available to be absolutely sure.

Once a collision table has been constructed, it is used in any of the routines previously described simply by passing it in in place of the memory-size input to the routine. For example, a lisp version of the 32 4x4 tilings example, using the safe version, would look like this:

(setq ctable (make-ct 512 :safe)) ; this is done once

(tiles 32 ctable (list (/ x .25) (/ y .25))) ; done for each x,y

In Python, you obtain a CollisionTable by calling CollisionTable(memsize, safetyval), with meanings as above except safetyval is one of 'safe', 'unsafe', or 'super safe'. Once a collision table has been constructed, it is used in any of the routines previously described simply by passing it in in place of the memsize input to the routine. For example, a python version of the 32 4x4 tilings example, using the safe version, would look like this:

ctable = CollisionTable(512, 'safe') # this is done once

tiles(32, ctable, (x/0.25, y/0.25)) # this is done for each x,y

The version where Python calls C does not have the 'super safe' collision table options, since it has not been implemented in the C version (the required comparisons would slow things down considerably).

Wrap-around Versions

The tilings we have discussed so far stretch out to infinity with no need to specify a range for them. This is cool, but sometimes not what you want. Sometimes you want the variables to wrap-around over some range. For example, you may have an angle variable that goes from 0 to 2pi and then should wrap around, that is, you would like generalization to occur between angles just greater than 0 and angles that are nearly 2pi. To accomodate this, we provide some alternative, wrap-around versions of the tiling routines.

These versions take an additional input, wrap-widths, which parallels the float structure (array or list), and which specifies for each float the width of the range over which it wraps (e.g., 2pi). If you don't want a float to wrap, it's wrap-width should be zero (in LISP, nil). The wrapping width is in the same units as the floats, but should be an integer. Specifying the complete range of wrapping (rather than just the width) is not necessary for the same reason we did not need to give a range at all in the previous routines.

For example, suppose you wanted to cover the 2pi x 3 rectangular area with 16 tilings, each with a width of generalization one-tenth of the space in each dimension, and with wrap-around in the first dimension but not the second. In C you would do

int tiles_array[16];

float float_array[2];
int wrap_array[2];

float_array[0] = x / (3 * 0.1);
wrap_array[0] = 0;

float_array[1] = y / (2 * 3.14159 * 0.1);
wrap_array[1] = 10;

tileswrap(tiles_array,16,512,float_array,2,wrap_array);

where tileswrap has the interface:

void tileswrap(
 int tiles[], // provided array will contain the returned tile indices
 int num_tilings, // number of tile indices to be returned
 int memory_size, // total number of possible tiles
 float floats[], // array of floating-point vars to be converted to tiles
 int num_floats, // number of floating-point variables
 int wrap_widths[], // array of widths (length and units as in floats)
 int ints[], // array of integer variables to be converted to tiles
 int num_ints) // number of integer variables

Previous C versions had a bug because of an oddity of the C mod function, but this has been fixed.

In LISP, the wrapping routines are:

(tiles-wrap 
 num-tilings ; the number of tilings desired
 memory-size ; the number of possible tile indices
 floats ; a list of real values making up the input vector
 wrap-widths ; list of widths for wrap-around on floats (ints or nil)
 ints) ; optional list of integers to get different hashings

and

(load-tiles-wrap 
 tiles ; a provided array for the tile indices to go into
 starting-element ; first element of "tiles" to be changed (typically 0)
 num-tilings ; the number of tilings desired
 memory-size ; the number of possible tile indices
 floats ; a list of real values making up the input vector
 wrap-widths ; list of widths for wrap-around on floats (ints or nil)
 ints) ; list of optional inputs to get different hashings

In Python (and the Python calling C version), the wrapping routines are:

tileswrap(numtilings, memsize, floats, wrapwidths, ints)

and





loadtileswrap(tiles, startelement, numtilings, memsize, floats,

wrapwidths, ints)

Extend this Page How to edit Style Subscribe Notify Suggest Help This open web page hosted at the University of Alberta. Terms of use 931/1