 Reinforcement Learning and Artificial Intelligence (RLAI) Tile Coding Software -- Reference Manual, Version 3.0

Here we describe software implementing the core part of the idea of tile coding as described in Section 9.5.4 of the reinforcement-learning textbook by Sutton and Barto. That section should be read and fully understood before reading this manual or using this software. The software is currently available in Python and Common Lisp. An implementation in C or C++ would be a welcome contribution.

Tile coding is a way of representing the values of a vector of continuous variables as a large binary vector with few 1s and many 0s. The binary vector is not represented explicitly, but as a list of the components that are 1s. The main step is to partition, or tile, the continuous space multiple times and select one tile from each tiling, that corresponding the the vector's value. Each tile is converted to an element in the big binary vector, and the list of the tile (element) numbers is returned as the representation of the vector's value. Tile coding is recommended as a way of applying online learning methods to domains with continuous state or action variables.

The tile-coding software evolved from software developed at the University of New Hampshire for a related but more specialized use called "CMAC" (see the
external documentation by Miller and Glanz). Here we separate tile coding from full CMACs, which allows it to be used more flexibly. The current software is also simplified and streamlined consistent with its use in reinforcement learning applications. For example, our code is only one page in length.

The code in Python is available here, and in Lisp is available here.

# A Simple Example

Here is a simple example of the use of the tile coding software from Python.

Whatever you are tiling, you will need an index hash table, or IHT. You make one by calling `IHT(size)`, e.g.,
``iht = IHT(1024)``
Suppose you wish to tile a two-dimensional x,y space, where each dimension runs from 0 to 10. The tiling software will partition this using multiple 10 by 10 partitions, each offset from the other (it is 10 by 10 because the software partitions at integer boundaries). You obtain a list of tile indices for a point (here the point (x, y) = (3.6, 7.21)) by calling
``indices = tiles(iht, 8, [3.6, 7.21])``
The second argument (8) is the number of offset tilings. The point will fall in one tile in each of the 8 tilings, so the returned list will have exactly eight positive integers in it, all between 0 and one less than the size of iht. In general, the indices may be something like [374, 971, 246, 435, 23, 739, 336, 827], though in this special case, if this is the first time you've used iht, the indices will be [0, 1, 2, 3, 4, 5, 6, 7]. You are guaranteed that all of these exact indices will be returned if ever in the future you tile this same point. If you tile a new point within a distance of about 1 in both x and y directions, then the returned indices will have some in common with this point. Thus:

``>>> iht = IHT(1024)>>> tiles(iht, 8, [3.6, 7.21])[0, 1, 2, 3, 4, 5, 6, 7]>>> tiles(iht, 8, [3.7, 7.21])         # a nearby point[0, 1, 2, 8, 4, 5, 6, 7]               # differs by one tile>>> ``tiles(iht, 8, [4, 7])              # while a farther away point[9, 10, 11, 8, 4, 12, 6, 7]`            # differs by 4 (or 5) tiles`>>> tiles(iht, 8, [-37.2, 7])          # and a point more than one away in any dim[13, 14, 15, 16, 17, 18, 19, 20]       # will have all different tiles``

As the last example shows, far away points have no tiles in common. It also shows that the idea of a 10 by 10 space is really just in our minds. As far as the software is concerned there is a 2D plane stretching to infinity in all directions. (Of course if you visit too much of the space you will run out of indices -- you only have as many as you specified with the size of iht.)

# Fleshing out the example

Suppose you wanted to do some supervised learning of a two dimensional surface. That is, you have data of examples of the form (x, y, z), where x,y are locations in the plane and z is the desired height at those coordinates. Let's say x and y run each from 1 to 3, and that you would like to use, again, 10 x 10 grid tilings. Because the tile coding software always partitions at the integer boundaries, you will need to scale your points to a 10 x 10 region. It is almost always useful to make your own interface routine, here called `mytiles`, that takes your data in its original form and converts its scale so that the tiles can have a width of one (the absolute values are not important). Here is the complete code, including learning:
``` ```
```from tiles3 import tiles, IHT maxSize = 2048```
```iht = IHT(maxSize)```
```weights = *maxSize```
```numTilings = 8```
```stepSize = 0.1/numTilings```

```def mytiles(x, y):```
```    scaleFactor = 10.0/(3-1)```
```    return tiles(iht, numTilings, list(x*scaleFactor,y*scaleFactor))```

```def learn(x, y, z):```
```    tiles = mytiles(x, y)```
```    estimate = 0```
```    for tile in tiles: ```
```        estimate += weights[tile]                  #form estimate```
```    error = z - estimate```
```    for tile in tiles: ```
```        weights[tile] += stepSize * error          #learn weights```

```def test(x, y):```
```    tiles = mytiles(x, y)```
```    estimate = 0```
```    for tile in tiles: ```
```        estimate += weights[tile]```
```    return estimate ```

You can then train using `learn` and obtain estimates using `test`.

# Arguments

The full tiles routine takes three required arguments and one optional argument:

```tiles(iht, numTilings, floats, ints) ```

The first argument is either an index hash table (next section) or a positive integer specifying the upper range of returned indices. In the latter case, pseudo-random hashing is done from the beginning without a hash table. This is the most computationally efficient form of tile coding and is necessary for very large problems, but we don't recommend you start with it.

If the first argument is a positive integer, then for best results it should be a large power of two.

For best results, the second argument, numTilings, should be
a power of two greater or equal to four times the number of floats. The third argument is a list or tuple of floating point numbers to be tiled.

The
fourth argument, `ints`, is optional, defaulting to None. It is a list of integers that will also be tiled; all distinct integers will result in different tilings. In reinforcement learning, discrete actions are often provided here. More generally, if multiple groups of tilings (multiple calls to `tiles`) are used with the same IHT, then each should differ in the first element of `ints`, as described later.

# Index hash tables

An index hash table is a data structure providing an interface to an underlying hash table (a dictionary, in Python) that efficiently keeps track of which indices have been used so far, gives you new ones when asked, and warns you when you have asked for too many. Basically, it makes sure that every new tile you encounter is given a separate index up to the provided size. The interface routines are:

```    IHT(size) ```Make a new index hash table that is guaranteed to never produce an index larger than size. If you do ask for too many indexes, it prints a warning message and then starts reusing old indexes in a semi-random but reliable manner. Be generous with size.

```    iht.count ```The number of indices currently used. All used indices will be between 0 and one less than this number. The count is alway less than the size, and only count have been used, so knowing the count can often speed operations that only need be done on the used features (e.g., eligibility traces).

```    iht.fullp() ```Returns True iff the index hash table is full.

```    iht.overfullCount ```The number of indices asked of the index hash table over and beyond its capacity. This starts at zero, and a warning message is printed when it becomes 1.

```    iht.size ```The size of the index hash table.

Next we describe examples, gradually increasing in complexity, of the use of the tiles routine.

## Example: 32 4x4 Rectangular Grid-Tilings

In the simplest case, we form simple grid-like tilings. Suppose we are interested in the unit square over two real variables x and y. We wish to tile it into 4x4 tilings (i.e., each tile is .25 by .25). This will create the possibility of generalization between any two input vectors in which both components are within .25 of each other. `tiles` assumes unit generalization in its inputs, so we arrange for .25 generalization by scaling x and y before calling tiles:

```def mytiles(x, y): ```
```    return tiles(iht, 32, list(x*4,y*4))```

Despite this fairly broad generalization, we want the ability to make fine distinctions if required. For this we need many tilings, say 32, as specified above. This will give us an ultimate resolution of .25/32=.0078, or better than 1%.

Finally, we have to select the size of the index hash table. This is the range of tile indices (in [0,size)) that will be returned. First we consider the "natural" memory size, as if we were doing no hashing. This is simply the total number of tiles in all the tilings, or (4+1)*(4+1)*32=800 in our case. This is not a large number and might be used directly as the `size` of the index hash table. If you don't use an IHT, then you should use something considerable larger and a power of two.

## Complex (Multiple-Call) Variations

Sometimes one can get by with a single kind of tiling, as produced by a single call to `tiles` (note that a single call creates multiple tillings, but these are all simple offset copies of each other). But for many other cases one needs to be a little more sophisticated. A great deal can be done to control the generalization and to limit the memory requirements by using multiple tilings through multiple calls to `tiles`. We consider some of these below.

Suppose as previously you have input vectors in the unit square, but you want to generalize between input vectors that are similar along the x dimension OR similar along the y dimension. To do this you need two kinds of tilings, one producing stripes horizontally and one vertically. Suppose you want to make 32 tilings of each type, and combine the results in one 64-element list of tiles:

`tiles(iht, 32, [x/0.25], ``[``0``]````) + ``````tiles(iht, 32, [y/0.25], ````[``1``]``)`

Notice that we change the final integer input from 0 to 1 in the second call. Whenever using multiple calls like this, it is a good idea to give them each a different setting for the integer arguments.

There are a lot of possibilities here. One could use two groups of tilings, one for each single dimension, as illustrated above, PLUS an additional group of tilings that depends on both dimensions. This would create a system that generalizes partly along each dimension separately, but also on the two dimensions conjunctively, so that it could if needed learn to give a special response to a particular pair of x,y values. Another possibility would be to choose some groups of tilings with coarse generalization and some with fine.

One can control the tendency to generalize in the different ways given by each group of tilings by controlling the number of tilings in each group. For example, if you wanted discrimination to be stronger along x rather than y, then you would ask for more x tillings (more than 32) in the above example, or fewer y tilings.

## Getting Consistent Hashing When Not Using an Index Hash Table

In many applications one wants to make sure hashing, a random process, is done in a consistent way, e.g., across different runs of a program. The hashing done by the routines described here (when not using an index hash table) is determined by some tables filled with random numbers the first time the routine is called. To get consistent hashing you need only ensure that the random number generator is in a consistent state for the first call. A convenient way to do this is to make a dummy first call during an initialization portion of your program. For example, an early portion of your C++ code could include:

`int dummy_tiles;`
`float dummy_vars;`

`srand(0);`
`tiles(dummy_tiles,1,1,dummy_vars,0); // initializes tiling code`

In lisp, you would use an implementation-dependent procedure to set `*random-state*` in a reproducable manner, then call `(tiles nil 1 1)`.

In Python, you can seed the random number generator with `random.seed(number)`, and then import the `tiles` module.

## Lisp Versions

The LISP version has almost the same interface as the Python:

```    (tiles iht num-tilings float &optional ints) `````` ```

```    (tiles-wrap iht num-tilings float wrap-widths &optional ints)```

```    (make-iht size)     (iht-count iht)     (iht-fullp iht)     (iht-overfull-count iht)     (iht-size iht)```

## Python Calling C Versions

In order to improve the efficiency of the tiling functions, but still have the flexibility of Python, we would also like a version available for Python which calls the C version of tiles. The calling sequences should be exactly the same as for the Python versions.

## Wrap-around Versions

The tilings we have discussed so far stretch out to infinity with no need to specify a range for them. This is cool, but sometimes not what you want. Sometimes you want the variables to wrap-around over some range. For example, you may have an angle variable that goes from 0 to 2pi and then should wrap around, that is, you would like generalization to occur between angles just greater than 0 and angles that are nearly 2pi. To accomodate this, we provide some alternative, wrap-around versions of the tiling routines.

These versions take an additional input, `wrapwidths`, which parallels the float structure (array or list), and which specifies for each float the width of the range over which it wraps. If you don't want a float to wrap, it's wrapwidth should be False (in LISP, nil). The wrapwidth is in the same units as the floats, but should be an integer. This can be confusing, so let's do a simple 1D example. Suppose you have one real input, an angle theta. Theta is originally in radians, that is, in [0,2pi), which you would like to wrap around at 2pi. Remember that these functions all have their tile boundaries at the integers, so if you passed in theta directly there would be tile boundaries at 0, 1, and 2, i.e., just a few tiles over the whole range, which is probably not what you want. So let's say what we want! Suppose we want tilings with 10 tiles over the [0,2pi) range. Then we have to scale theta so that it goes from 0 to 10 (instead of 0 to 2pi). One would do this by multiplying theta by 10/2pi. With the new scaled theta, the wrapping is over [0,10), for a wrap_width of 10. Here is the Python code for this case, assuming we want 16 tilings over the original [0,2pi) range with a memory size of 512:

``` ```
```from tiles3 import tiles, IHT iht = IHT(512) tiles(iht, 16, [theta * 10/(2*3.14159)], )```

Note that the code would be exactly the same if the original range of theta was [-pi,+pi]. Specifying the complete range of wrapping (rather than just the width) is not necessary for the same reason as we did not need to give a range at all in the previous routines.

As another example, suppose you wanted to cover the 2pi x 3 rectangular area with 16 tilings, each with a width of generalization one-tenth of the space in each dimension, and with wrap-around in the first dimension but not the second. In C you would do

```from tiles3 import tiles, IHT iht = IHT(512) tiles(iht, 16, [x/(3*0.1), y/(2*3.14159*0.1)], [False, 10])```

In Python (and the Python calling C version), the wrapping routines are:

```tileswrap(numtilings, memsize, floats, wrapwidths, ints)```