Reinforcement Learning and
Artificial
Intelligence (RLAI) |
|
Tile
Coding Software -- Reference Manual, Version 3.0 |
IHT(size)
, e.g., iht = IHT(1024)
Suppose you wish to tile a
two-dimensional x,y space, where each
dimension runs from 0 to 10. The tiling software will partition this
using multiple 10 by 10 partitions, each offset from the other (it is
10 by 10 because the software partitions at integer boundaries). You
obtain a
list of tile indices for a point (here the point (x, y) = (3.6, 7.21))
by callingindices = tiles(iht, 8, [3.6, 7.21])
The
second argument (8) is the number of offset tilings. The point will
fall in one tile in each of the 8 tilings, so the returned list will
have exactly eight positive integers in it, all between 0 and one less
than the size of iht. In general, the indices may be something like
[374, 971, 246, 435, 23, 739, 336, 827], though in this special case,
if this is the first time you've used iht, the indices will be [0, 1,
2, 3, 4, 5, 6, 7]. You are guaranteed that all of these exact indices
will be returned if ever in the future you tile this same point. If you
tile a new point within a distance of about 1 in both x and y
directions, then the returned indices will have some in common with
this point. Thus:
>>> iht = IHT(1024)
>>> tiles(iht, 8, [3.6, 7.21])
[0, 1, 2, 3, 4, 5, 6, 7]
>>> tiles(iht, 8, [3.7, 7.21]) # a nearby point
[0, 1, 2, 8, 4, 5, 6, 7] # differs by one tile
>>>tiles(iht, 8, [4, 7]) # while a farther away point
# differs by 4 (or 5) tiles
[9, 10, 11, 8, 4, 12, 6, 7]
>>> tiles(iht, 8, [-37.2, 7]) # and a point more than one away in any dim
[13, 14, 15, 16, 17, 18, 19, 20] # will have all different tiles
mytiles
,
that
takes your data in its original form and converts its scale so that
the tiles can have a width of one (the absolute values are not
important).
Here is the complete code, including learning:
from
tiles3 import tiles, IHT
maxSize = 2048
iht
= IHT(maxSize)
weights
= [0]*maxSize
numTilings
= 8
stepSize
= 0.1/numTilings
def
mytiles(x, y):
scaleFactor = 10.0/(3-1)
return tiles(iht, numTilings, list(x*scaleFactor,y*scaleFactor))
def
learn(x, y, z):
tiles = mytiles(x, y)
estimate = 0
for tile in tiles:
estimate +=
weights[tile]
#form estimate
error = z - estimate
for tile in tiles:
weights[tile] += stepSize *
error #learn
weights
def
test(x, y):
tiles = mytiles(x, y)
estimate = 0
for tile in tiles:
estimate +=
weights[tile]
return estimate
learn
and obtain estimates using
test
.tiles(iht, numTilings, floats, ints)
ints
, is optional, defaulting to None. It is a
list of integers that will also be tiled; all distinct integers will
result in different tilings. In reinforcement learning, discrete
actions are often provided here. More generally, if multiple groups of
tilings (multiple calls to tiles
)
are used with the same IHT, then each should differ in the first
element of ints
,
as described later.An index hash table is a data structure providing an interface to an
underlying hash table (a dictionary, in Python) that efficiently keeps
track of which indices have been used so far, gives you new ones when
asked, and warns you when you have asked for too many. Basically, it
makes sure that every new tile you encounter is given a separate index
up to the provided size. The interface routines are:
IHT(size)
Make a new index hash table that is guaranteed to never produce
an index larger than size. If you do ask for too many indexes, it
prints a warning message and then starts reusing old indexes in a
semi-random but reliable manner. Be generous with size.
iht.count
The number of indices currently used. All used indices
will be between 0 and one less than this number. The count is alway
less than the size, and only count have been used, so knowing the count
can often speed operations that only need be done on the used features
(e.g., eligibility traces).
iht.fullp()
Returns True iff the index hash table is full.
iht.overfullCount
The number of indices asked of the index hash table over and
beyond its capacity. This starts at zero, and a warning message is
printed when it becomes 1.
iht.size
The size of the index hash table.
Next we describe examples, gradually increasing in complexity, of the use of the tiles routine.
tiles
assumes unit generalization in its inputs, so we arrange for .25
generalization by scaling x and y before calling tiles:
def
mytiles(x, y):
return tiles(iht, 32, list(x*4,y*4))
Despite this fairly
broad generalization, we want the ability to make fine
distinctions if required.
For this we need many tilings, say 32, as specified above.
This will give us an ultimate resolution of .25/32=.0078, or better
than 1%.
Finally, we have to select the size of the index hash table. This is
the range
of tile indices (in [0,size)) that will be returned. First we consider
the
"natural" memory size, as if we were doing no hashing. This is simply
the total number of tiles in all the tilings,
or (4+1)*(4+1)*32=800 in our case. This is not a large number and might
be used
directly as the size
of the index hash table.
If you don't use an IHT, then you should use something considerable
larger and a power of two.
Sometimes one can get by with a single kind of tiling, as
produced by a
single call
to tiles
(note that a single call creates multiple
tillings, but these are
all simple offset copies of each other). But for many other cases one
needs to be a little more sophisticated.
A great deal can be done to control the generalization and to limit the
memory requirements
by using multiple tilings through multiple calls to tiles
.
We consider some of
these below.
Suppose as previously you have input vectors in the unit square,
but
you want to generalize
between input vectors that are similar along the x dimension OR
similar along the y
dimension. To do this you need two kinds of tilings, one
producing stripes horizontally and one vertically. Suppose you want to
make 32 tilings of each type, and combine the results
in one 64-element list of tiles:
tiles(iht, 32, [x/0.25],
[
0
]
)
+
tiles(iht,
32, [y/0.25],
[
1
]
)
There are a lot of possibilities here. One could use two groups of tilings, one for each single dimension, as illustrated above, PLUS an additional group of tilings that depends on both dimensions. This would create a system that generalizes partly along each dimension separately, but also on the two dimensions conjunctively, so that it could if needed learn to give a special response to a particular pair of x,y values. Another possibility would be to choose some groups of tilings with coarse generalization and some with fine.
One can control the tendency to generalize in the different ways given by each group of tilings by controlling the number of tilings in each group. For example, if you wanted discrimination to be stronger along x rather than y, then you would ask for more x tillings (more than 32) in the above example, or fewer y tilings.
In many applications one wants to make sure hashing, a random process, is done in a consistent way, e.g., across different runs of a program. The hashing done by the routines described here (when not using an index hash table) is determined by some tables filled with random numbers the first time the routine is called. To get consistent hashing you need only ensure that the random number generator is in a consistent state for the first call. A convenient way to do this is to make a dummy first call during an initialization portion of your program. For example, an early portion of your C++ code could include:
int dummy_tiles[1];
float dummy_vars[1];
srand(0);
tiles(dummy_tiles,1,1,dummy_vars,0); // initializes tiling code
*random-state*
in
a reproducable manner, then call (tiles nil 1 1)
.random.seed(number)
, and then import the
tiles
module.The LISP version has almost the same interface as the Python:
(tiles iht num-tilings float &optional
ints)
(tiles-wrap iht num-tilings float
wrap-widths &optional ints)
(make-iht size)
(iht-count iht)
(iht-fullp iht)
(iht-overfull-count iht)
(iht-size iht)
In order to improve the efficiency of the tiling functions, but
still have the flexibility of
Python, we would also like a version available for Python which calls
the C
version of tiles. The calling
sequences should be exactly the same as for the Python versions.
The tilings we have discussed so far stretch out to infinity with no need to specify a range for them. This is cool, but sometimes not what you want. Sometimes you want the variables to wrap-around over some range. For example, you may have an angle variable that goes from 0 to 2pi and then should wrap around, that is, you would like generalization to occur between angles just greater than 0 and angles that are nearly 2pi. To accomodate this, we provide some alternative, wrap-around versions of the tiling routines.
These versions take an additional input, wrapwidths
,
which parallels the float structure (array or list), and which
specifies for each float the width of the range over which it wraps. If
you don't want a float to wrap, it's wrapwidth should
be False (in LISP, nil). The wrapwidth is in the same units as the
floats, but should be an integer. This can be confusing, so let's do a
simple 1D example. Suppose you have one real input, an angle theta.
Theta is originally in radians, that is, in [0,2pi), which you would
like to wrap around at 2pi. Remember that these functions all have
their tile boundaries at the integers, so if you passed in theta
directly there would be tile boundaries at 0, 1, and 2, i.e., just a
few tiles over the whole range, which is probably not what you want. So
let's say what we want! Suppose we want tilings with 10 tiles over the
[0,2pi) range. Then we have to scale theta so that it goes from 0 to 10
(instead of 0 to 2pi). One would do this by multiplying theta by
10/2pi. With the new scaled theta, the wrapping is over [0,10), for a
wrap_width of 10. Here is the Python code for this case, assuming we
want 16
tilings over the original [0,2pi) range with a memory size of 512:
from
tiles3 import tiles, IHT
iht = IHT(512)
tiles(iht, 16, [theta * 10/(2*3.14159)], [10])
Note that the code would be exactly the same if the original range of theta was [-pi,+pi]. Specifying the complete range of wrapping (rather than just the width) is not necessary for the same reason as we did not need to give a range at all in the previous routines.
As another example, suppose you wanted to cover the 2pi x 3
rectangular
area with 16 tilings, each with a width of generalization one-tenth of
the space in each dimension, and with wrap-around in the first
dimension but not the second. In C you would do
from
tiles3 import tiles, IHT
iht = IHT(512)
tiles(iht, 16, [x/(3*0.1), y/(2*3.14159*0.1)], [False, 10])
In Python (and the Python calling C version), the wrapping routines
are:
tileswrap(numtilings,
memsize, floats, wrapwidths, ints)