The
ambition of this web
page is to provide information about the intended contents of the RL
toolkit contents. The toolkit is a collection of tools, examples and
demos of and
for reinforcement learning, developed by the RLAI group. It is hoped
that these tools will be useful for those learning, teaching or using
reinforcement learning.
Converted to python 3.5 on Oct 2016. -rss
RL Toolkit Wishlist
(contents of RL Toolkit 1.0 discussed later)
- Function Approximation
- Tile coding in various forms (with and without collision
tables)
- Linear FA code (linear and averaging function approx)
- Kanerva coding
- backprop neural networks
- Cascade correlation
- IDBD
- graphical displays showing stability, shape of tiles,
timings, etc
- Test suite for fa stuff
- 2D space - see how it generalizes, given training set
with
- few points
- zillions of points (eg sine wave, sample densely)
- RL Algorithms
- Qlearning
- Sarsa(lambda)
- TD(lambda)
- Dyna
- priority sweeping
- Policies
- epsilon greedy
- Boltzman gibbs
- rmax
- interval estimation
- etc
- Action selection strategies
- actor-critic
- policy gradient
- REINFORCE
- lookahead search
- discounted, episodic, and average-reward formulations
- Eligibility traces
- accumulating
- replacing
- naive and efficient implementations
- Monte Carlo
- Importance Sampling
- Methods for continuous action spaces
- Methods for large discrete action spaces
- Dynamic Programming Stuff
- Value Iteration (tabular +)
- Generalized Policy Iteration (tabular +)
- Planning with simulators/trajectories
- Test Suite for progression/regression testing ("Gold" standard)
- Examples and demos
- lots of agents
- lots of environments
- examples - Acrobot, Mountain Car, Maintenance example,
Gridworld, Puddle World, Puck World, Pole,
Robosoccer
- fa demo
- Enhance gridworld demo to see more things at
once, eg traces
- Do mountain car really well, with right abstractions, etc
- plug in different kinds of fa's (tile coding, etc)
- Bakeoff (Benchmark) tasks
- create environments and run algorithms against them all
- (random generation process)
- Software for experimenting with
variations in
- parameter settings
- feature sets
- algorithmic implementations
- etc
- Automated methods for choosing representation, e.g. unsupervised
- Integration with auditory and visual (graphical) displays
- Ability to interconnect agents and environments across internet
This is a bit of a wish list, but I hope the ambition it
expresses will keep us from getting bogged down. And it is incomplete.
you might consider adding some more wishes to it if they spring to
mind. I hope you will also consider being more directly a part of
creating the RL toolkit. If we work together we can not only get more
done, but likely produce a better product as well, one that will be
more useful to ourselves and to others.
RL Toolkit 1.0 Projected Contents
- assuming small # discrete actions, complete observability
- tabular, state aggregation, and linear function approximation
- Objective = linear sarsa(lambda), tabular dp
- RL interface (RLinterface, rl.steps, rl.episodes, etc)
- Tile coding (with and without collision tables)
- Linear function approximation
- GUI demo for function approximation with tile coding
- Eligibility traces (tabular, linear) - accumulate/replace
options, non-zero traces only
option
- Qlearning
- Sarsa(lambda)
- TD(lambda)
- Policies
- Test suite and demos (non gui except for fa) for above
- for progression/regression testing (standards)
- Examples and demos and benchmarks for above
- mountain car
- maintenance example
- gridworld
- Tabular dynamic programming
- And perhaps also:
- some examples from
the book (some already done in Lisp or C++)
- acrobot
- blackjack
- examples with and without objects
- examples with and without RL
interface
- bunch of agents and environments (to mix and match?)
-- Main.RichSutton - 14 Mar 2004