||Reinforcement Learning and
RL Interface and Benchmarks
The ambition of this page
is to keep the members of the RL benchmark and Interface team up to
date with recent developments and provide general interest groups with
information, standards, documentation and product availability.
Edited by Adam White
More to come soon....
when: 2:30pm ~ October 7th 2005
where: RLAI Lab
- update group about our recent public release
- discuss documention, and its matainance
- Should we support windows - currently only unix/mac/linux
- and is it even possible??
- Phase II implementation
- Mark and I have a proposal of how this might
- Any workshop considerations
September 15th 2005
- everything Rich and I(Adam) talked about tuesday
- de_init for agents and evns
- web page
- the groups feedback (or lack there of)
During this meeting we fleshed out the overall framework of the system,
talked about compatability concerns, and finalized some macro issues.
Below are some notes from the discussion (thanks anna :)
How it should work
- if you wanted to run a competition, you would want to restrict access
to the world
- the main model would be going to the repository, getting the programs
- like UCI database - not monolithic, just pick which of what you use
- client/server issue - can my code run on your machine?
- standard benchmarks which are visible, then competitions where
- provide benchmark in various forms (eg training set, with labels
changed or order changes) - so any table lookup reinforcement learning
problem, change the actions
- focus on benchmarks, run across machines
- RL Bench (CMU), CL^2 in Germany, us
- RL Bench communicate over a text-based protocol
Agent and environment written it any language, as long as they comply
to the RL Interface (7.0)
(agent_start(),agent_step(),env_start(),env_step(),etc.). Also the
agent and environment can be in another executable (communication by
pipes), or on another machine (network socket communication).
RL_Interface written in C
RL_Interface - calls the RL Interface 7.0 (agent_start(), agent_step(),
etc.), but the agent and environment it calls may be adapters instead
of actual agents or environments. Adapters will translate between
languages, or communicate over pipes/sockets.
- putting a network communication layer in between
- we create libraries that translate (examples) lisp
agent->network->RL Interface, c
- could communicate within the agent to the RLInterface directly
(opening sockets) but then dependent on following our standards
How to distribute RL_Interface?
- want to be able to compile it with C for speed
- Should be modular (users choose what they need)
- Set of agents and environments to choose from
Need something outside of the agent and environment to track statistics
- Rich - RL Interface has various benchmarks you could run
- average reward, etc.
- big win if the person writing the agent code doesn't have to know
anything about the interface - sets number of episodes and performance
measure without doing anything extra
- trivial main, benchmark mains, you can write your own mains which
determine experimental setup and gui, etc.
------------End of Notes------------------------------
Below is the diagram of the stucture of the new RL-interface. This is
sure to change slightly but it gives a good overall sense of the
ambition and direction of the project (Thanks again Anna :).
Below are my (Mark) thoughts on the task_spec and data types for
actions, observations and rewards:
>In the meeting it was mentioned that the task_spec could specify a
>"level" to which it complies. I think this might be the best way
>to go. That way it can be pretty simple (lower levels) while still
>allowing for more complication (higher levels).
>So, then, the first level could be that all the data types are
>integers and the task_spec simply specifies how many states and
>actions there are.
Level 1 Environments:
Eg: "1,5,2,-1" for a world with 5 states, 2 actions, and -1 being
>Level 2 could be for a continuous world. All data types
>be floating point (should actions be? Continuous actions seems
>more complicated, maybe best to leave to a higher level?). The
>task_spec would specify the range of the state space (and
>spcae?) (Does the range of the reward space need to be supplied?
>Does that add anything?).
Level 2 Environments:
eg: "2,42,4242,0,7,-1" for a world with states numbered from 42 to 4242
and actions numbered from 0 to 7, with -1 being the terminal state.
>Level 3 could be for a multi-dimensional discreet world. So,
>the state and action space would be an array of integers. The
>task_spec would specify the number of dimensions, followed by
>the size of each dimension (range of values in that dimension).
Level 3 Environments:
(newlines for email clarity, not part of the specification)
Eg: "3,2,0,10,1,6,3,1,3,1,3,1,4,0,1" would be a world with 2 state
dimensions, the first ranging from 0 to 10, the second from 1 to 6. 3
action dimensions, the first ranging from 1 to 3, the second also from 1
to 3, and the third from 1 to 4. The terminal state would have the first
dimension be 0 and the second dimension be 1.
The state would be terminal if all of the state variables equal the
terminalState values. Something like -MAX_INT could be used for just say
the first state to indicate that it is terminal.
>Level 4 could be for a multi-dimensional continuous world. State
>and action space would be an array of floating point. task_spec
>would be the # of dimenions followed by the range of each
Level 4 Environments:
Same as level 3, but starting with a 4 not a 3.
>And, finally, level 5 could be for a general multi-dimensional
>world. So, the task_spec would specify the # of dimensions, followed
>by whether or not that dimension is continuous, followed by the
>range in that dimension. (for a world where a possible action
>might be [5, 0.3, 2], for example).
Level 5 Environments:
Where the discrete flags are simply 1 or 0, 1 for a continuous
state/action and 0 for a discrete state/action.
Would be a world with 3 state dimensions, the first being continous
from 0 to 1, the second being discrete values from 1 to 10, and the
being continuous values from 5 to 8. There are 2 action dimensions, the
first being discrete values from 1 to 8, and the second being continous
values from 0 to 3.141. The terminal state would be 0.5, 7, 7.
>For levels 2 and above the terminal state would need to be specified
>in the task_spec as well.
Below is the current (untested) code. In this code the task_spec is
assumed to be of the form "n m", where n is the number of states and m
the number of actions.
- Global definitions for RL_Interface component
- The interface to the environment that the RL_Interface expects
- The interface to the agent that the RL_Interface expects
- A rudimentary sarsa agent
- The RL_Interface code itself (assumes Level 1 above)
- The interface to the RL_Interface
- Mine Problem - Envronment with
and observations in C++
- main.c -
Includes the main function and prints time taken by the completed run.
- An adapter for an agent written in Python
- An Adapter for an environment written in Python
- A hall environment writtenÍ in Python
- A random agent written in Python
- Code to access a python agent over XML RPC. Agent 2 is what the
- A makefile to compile the Mine Problem with the Sarsa Agent above.
- All of the code in a consistent version in one package.
This is what we sent out to some of the people who will be attending
and contributing to the NIPS workshop