Tile Coding Software

Tiles.c and Tiles.lisp

by Richard S. Sutton

tiles.C (with tiles.h) and tiles.lisp are code in C and LISP implementing the core part of the idea of tile coding with hashing. Tile coding is a way of representing the values of a vector of continuous variables as a large binary vector with few 1s and many 0s. The binary vector is not represented explicitly of course, but as a list of the components that are 1s. The main step is to partition, or tile, the continuous space multiple times and select one tile from each tiling, that corresponding the the vector's value. Each tile is converted (hashed down to) an element in the big binary vector, and the list of the tile (element) numbers is returned as the representation of the vector's value. Information and examples of tile coding and its use in function approximation and reinforcement learning are given in Section 8.3.2 of our RL book.

Tile coding is the key distinguishing computational idea behind CMACs. Implementations of complete CMACs are available here from the folks at the University New Hampshire. That package is more suited to straight, bundled CMAC use. Here we separate tile coding from full CMACs, which allows it to be more flexibly used, e.g., with eligibility traces. The current software is also simplified and streamlined consistent with its use in reinforcement learning applications. For example, our code is only one or two pages in length and there is just one routine to call. That routine has no memory and requires no special setup or data types.

Here we describe the use of the tile coding routine. This description applies to both the C and LISP implementations except that some of the data handling is a little simpler in LISP. Mostly we assume we are talking about C; the simplifications for LISP users are summarized at the end.


GetTiles

The tile coding software provides a single, memoryless routine, called "GetTiles":

void GetTiles(
    int *tiles,                // provided array will contain the returned tiles (tile indices)
    int num_tilings,           // number of tile indices to be returned      
    float *variables,          // array of variables to be converted to tiles
    int num_variables,         // number of variables
    int memory_size,           // total number of possible tiles
    int hash1 = -1,            // optional arguments to get different hashings
    int hash2 = -1,            
    int hash3 = -1)            

The overall idea here is as follows. The first argument (tiles) is the output of the routine. This array is provided by the caller and filled up by GetTiles with tile indices (integers in [0,memory_size)). The second argument is the number of tile indices returned, so the provided array had better be of at least this length. The next argument is the primary input to GetTiles, the vector of continuous values (variables), and the following argument is the number of components to this vector. Finally, memory_size specifies the largest permitted tile index. This routine also allows from 0 to 3 optional hash arguments, all of which (if provided) should be different from -1. These are useful when a common memory space is shared over multiple calls to GetTiles, each of which is given a different set of values for these arguments. In this way, the multiple calls will hash in different ways, with only hashing collisions, rather than all to the same tiles. Next we describe examples, gradually increasing in complexity, of the use of the tiles routine.


Example: 32 4x4 Rectangular Grid-Tilings

In the simplest case, we form simple grid-like tilings. Suppose we are interested in the unit square over two real variables x and y. We wish to tile it into 4x4 tilings (i.e., each tile is .25 by .25). This will create the possibility of generalization between any two input vectors in which both components are within .25 of each other. GetTiles assumes unit generalization in its inputs, so we arrange for .25 generalization by scaling x and y and stuffing the results into a float array vars_array to be input to tiles:
float vars_array[2];
vars_array[0] = x / 0.25;
vars_array[1] = y / 0.25;

Despite this fairly broad generalization, we would like the ability to make fine distinctions if required. For this we need many tilings, say 32 (a power of two is good here). This will give us an ultimate resolution of .25/32=.0078, or better than 1%. To arrange for this we pass the tiles routine an integer array, tiles_array, of at least 32 elements, to hold the returned results, and pass 32 as the num_tilings argument.

int tiles_array[32];

Finally, we have to select memory_size, the range of tile indices that will be hashed down into. First we consider the "natural" memory size, as if we were doing no hashing. This is simply the total number of tiles in all the tilings, or 4*4*32=512 in our case. This is not a large number and might be used directly as the memory_size. Note, however, that this does not mean there will be no hashing or hash collisions. This routine always hashes and so collisions are possible. You could make them very unlikely by using a large memory size, say 5120, or even 512000 if you have enough memory for how the tiles are to be used. More often the natural memory size is too large, say 10^10, and we must hash into a smaller memory size.

Thus, our final call to GetTiles routine might be

GetTiles(tiles_array,32,vars_array,2,512);

The returned tiles can be used in many ways, as is discussed elsewhere. But let us complete this simple example by showing one use. To do simple linear function approximation one uses an array of weights of number equal to the total number of possible tiles (memory_size). The approximate function value for the input vector of real values is obtained simply by summing the weights of the corresponding returned tiles:

float weights[32]=0.0;
float result=0.0;
for (i=0;i<32;i++) result += weights[tiles_array[i]];
Finally, for learning, assume we have a value target that we wish to adjust the approximation result towards:
float target;
float alpha=0.1;  // step-size parameter
for (i=0;i<32;i++) weight[tiles_array[i]] += alpha * (target - result);

Simple (Single-Call) Variations

There are a number of useful simple variations on the example described above. Obviously, one could scale the two dimensions differently to get rectangular rather than square tiles. One could also stretch and distort the dimensions, for example, making one logarithmic. Or one could obtain a rotated grid by linear transformations on the original x and y variables. For example, to get a grid oriented along the diagonals, we construct the vars_array like this:

vars_array[0] = (x+y) / 0.25;
vars_array[1] = (x-y) / 0.25;

Note that the tiling produced in all these cases is not restricted to any region of the space. It extends in all directions to infinity; it is a tiling of the infinite plane. These infinite number of tiles are reduced to memory_size in number by the hashing.

If you want to restrict the input vector to a limited region, then you must do so yourself before calling GetTiles. If you want to achieve a wrap-around effect, then you should cause wrap-around when you construct the vars_array, e.g., by dividing by the width of the region and filling vars_array with the remainder.

Complex (Multiple-Call) Variations

Sometimes one can get by with a single kind of tiling, as produced by a single call to GetTiles (note that a single call creates multiple tillings, but these are all simple offset copies of each other). But for many other cases one needs to be a little more sophisticated. A great deal can be done to control the generalization and to limit the memory requirements by using multiple tilings through multiple calls to GetTiles. We consider some of these below.

Suppose as previously you have input vectors in the unit square, but you want to generalize between input vectors that are similar along the x dimension OR similar along the y dimension. To do this you need two kinds of tilings, one producing stripes horizontally and one vertically. Suppose you want to make 32 tilings of each type, and combine the results in one 64-element tiles_array:

float vars_array[1];
int tiles_array[64];
vars_array[0] = x / 0.25;
GetTiles(tiles_array,32,vars_array,1,4*32*2,0);
vars_array[0] = y / 0.25;
GetTiles(&tiles_array[32],32,vars_array,1,4*32*2,1);
The second call to GetTiles passes in, in effect, the second half of the full tiles_array. The tiles from the second group of tilings will go there. Notice that the memory_size input is the same for both calls and set roughly equal to the memory requirements for both sets of tiles: 4 (stripes over the unit interval) * 32 (number of tilings within a stripe) * 2 (because there are two sets of tilings over this memory space). As always, the memory_size can be chosen within wide limits, but it makes sense to chose it the same for all the groups of tilings. Finally, notice that we change the hash1 input from 0 to 1 in the second call. Whenever using multiple calls like this, it is a good idea to give them each a different setting for the hash arguments.

There are a lot of possibilities here. One could use two groups of tilings, one for each single dimension, as illustrated above, PLUS an additional group of tilings that depends on both dimensions. This would create a system that generalizes partly along each dimension separately, but also on the two dimensions conjunctively, so that it could if needed learn to give a special response to a particular pair of x,y values. Another possibility would be to choose some groups of tilings with coarse generalization and some with fine.

One can control the tendency to generalize in the different ways given by each group of tilings by controlling the number of tilings in each group. For example, if you wanted discrimination to be stronger along x rather than y, then you would ask for more x tillings (more than 32) in the above example, or fewer y tilings.


Getting Consistent Hashing

In many applications one wants to make sure hashing, a random process, is done in a consistent way. The hashing done by the routines described here is determined by some tables filled with random numbers the first time the routine is called. To get consistent hashing you need only ensure that the random number generator is in a consistent state for the first call. A convenient way to do this is to make a dummy first call during an initialization portion of your program. For example, an early portion of your C++ code could include:

int dummy_tiles[1];
float dummy_vars[1];
srand(0);
GetTiles(dummy_tiles,1,dummy_vars,0,1);
In lisp, you would use an implementation-dependent procedure to set *random-state* in a reproducable manner, then call (get-tiles nil 1 1).

Tiles.lisp

The LISP function "get-tiles" returns a list of tile indices. It is called as follows:

(get-tiles                   
   variables               ; a list of real values making up the input vector
   num-tilings             ; the number of tilings desired
   memory-size             ; the number of possible tile indices
   &rest hash-list)        ; optional inputs to get different hashings
All the inputs have the same meanings and uses as described above. Multiple calls are combined by appending (or nconcing) the resultant lists into one list. Up to 10 additional hashing arguments can be provided. For each setting of these, hashing will be done in a different random way.

The function "load-tiles" is also available as a maximally efficient interface. It loads a portion of provided array with the returned tile indices. It is called as follows:

(load-tiles  
   tiles                   ; a provided array for the tile indices to go into 
   starting-element        ; the first element of "tiles" to be changed (typically 0)        
   num-tilings             ; the number of tilings desired
   variables               ; a list of real values making up the input vector
   memory-size             ; the number of possible tile indices
   &rest hash-list)        ; optional inputs to get different hashings