Reinforcement Learning and
Artificial
Intelligence (RLAI)
RL Interface and Benchmarks
The ambition of this page
is to keep the members of the RL benchmark and Interface team up to
date with recent developments and provide general interest groups with
information, standards, documentation and product availability.
Edited by Adam White
Introduction
& Motivation
More to come soon....
Next Meeting:
when: 2:30pm ~ October 7th 2005
where: RLAI Lab
Agenda:
- update group about our recent public release
- discuss documention, and its matainance
- Should we support windows - currently only unix/mac/linux
- and is it even possible??
- Phase II implementation
- Mark and I have a proposal of how this might
work/look
- Any workshop considerations
Previous meetings:
September 15th 2005
Agenda:
- everything Rich and I(Adam) talked about tuesday
- de_init for agents and evns
- web page
- the groups feedback (or lack there of)
August
2nd 2005
During this meeting we fleshed out the overall framework of the system,
talked about compatability concerns, and finalized some macro issues.
Below are some notes from the discussion (thanks anna :)
-------------------------------------------
How it should work
- if you wanted to run a competition, you would want to restrict access
to the world
- the main model would be going to the repository, getting the programs
- like UCI database - not monolithic, just pick which of what you use
- client/server issue - can my code run on your machine?
- standard benchmarks which are visible, then competitions where
they're not
- provide benchmark in various forms (eg training set, with labels
changed or order changes) - so any table lookup reinforcement learning
problem, change the actions
- focus on benchmarks, run across machines
- RL Bench (CMU), CL^2 in Germany, us
- RL Bench communicate over a text-based protocol
Software Standard
Mark's Proposal
Agent and environment written it any language, as long as they comply
to the RL Interface (7.0)
(agent_start(),agent_step(),env_start(),env_step(),etc.). Also the
agent and environment can be in another executable (communication by
pipes), or on another machine (network socket communication).
RL_Interface written in C
RL_Interface - calls the RL Interface 7.0 (agent_start(), agent_step(),
etc.), but the agent and environment it calls may be adapters instead
of actual agents or environments. Adapters will translate between
languages, or communicate over pipes/sockets.
- putting a network communication layer in between
- we create libraries that translate (examples) lisp
agent->network->RL Interface, c
environment->network->RLInterface
- could communicate within the agent to the RLInterface directly
(opening sockets) but then dependent on following our standards
How to distribute RL_Interface?
- want to be able to compile it with C for speed
- Should be modular (users choose what they need)
- Set of agents and environments to choose from
Need something outside of the agent and environment to track statistics
- Rich - RL Interface has various benchmarks you could run
- average reward, etc.
- big win if the person writing the agent code doesn't have to know
anything about the interface - sets number of episodes and performance
measure without doing anything extra
- trivial main, benchmark mains, you can write your own mains which
determine experimental setup and gui, etc.
------------End of Notes------------------------------
Below is the diagram of the stucture of the new RL-interface. This is
sure to change slightly but it gives a good overall sense of the
ambition and direction of the project (Thanks again Anna :).
Below are my (Mark) thoughts on the task_spec and data types for
actions, observations and rewards:
>In the meeting it was mentioned that the task_spec could specify a
>"level" to which it complies. I think this might be the best way
>to go. That way it can be pretty simple (lower levels) while still
>allowing for more complication (higher levels).
>
>So, then, the first level could be that all the data types are
>integers and the task_spec simply specifies how many states and
>actions there are.
>
Level 1 Environments:
"1,numStates,numActions,terminalState"
Eg: "1,5,2,-1" for a world with 5 states, 2 actions, and -1 being
terminal.
>Level 2 could be for a continuous world. All data types
would
>be floating point (should actions be? Continuous actions seems
>more complicated, maybe best to leave to a higher level?). The
>task_spec would specify the range of the state space (and
action
>spcae?) (Does the range of the reward space need to be supplied?
>Does that add anything?).
>
Level 2 Environments:
"2,minState,maxState,minAction,maxAction,terminalState"
eg: "2,42,4242,0,7,-1" for a world with states numbered from 42 to 4242
and actions numbered from 0 to 7, with -1 being the terminal state.
>Level 3 could be for a multi-dimensional discreet world. So,
>the state and action space would be an array of integers. The
>task_spec would specify the number of dimensions, followed by
>the size of each dimension (range of values in that dimension).
>
Level 3 Environments:
"3,numStateDimensions,minState1,maxState1,...,minStateN,maxStateN,
numActionDimensions,minAction1,maxAction1,...,minActionN,maxActionN,
terminalState1,...,terminalStateN"
(newlines for email clarity, not part of the specification)
Eg: "3,2,0,10,1,6,3,1,3,1,3,1,4,0,1" would be a world with 2 state
dimensions, the first ranging from 0 to 10, the second from 1 to 6. 3
action dimensions, the first ranging from 1 to 3, the second also from 1
to 3, and the third from 1 to 4. The terminal state would have the first
dimension be 0 and the second dimension be 1.
The state would be terminal if all of the state variables equal the
terminalState values. Something like -MAX_INT could be used for just say
the first state to indicate that it is terminal.
>Level 4 could be for a multi-dimensional continuous world. State
>and action space would be an array of floating point. task_spec
>would be the # of dimenions followed by the range of each
>dimension.
>
Level 4 Environments:
Same as level 3, but starting with a 4 not a 3.
>And, finally, level 5 could be for a general multi-dimensional
>world. So, the task_spec would specify the # of dimensions, followed
>by whether or not that dimension is continuous, followed by the
>range in that dimension. (for a world where a possible action
>might be [5, 0.3, 2], for example).
>
Level 5 Environments:
"5,numStateDimensions,discreteSFlag1,minState1,maxState2,...,discreteSFlagN,
minStateN,maxStateN,numActionDimensions,discreteAFlag1,minAction1,maxAction1,
...,discreteAFlagN,minActionN,maxActionN,terminalState1,...,terminalStateN"
Where the discrete flags are simply 1 or 0, 1 for a continuous
state/action and 0 for a discrete state/action.
Would be a world with 3 state dimensions, the first being continous
values
from 0 to 1, the second being discrete values from 1 to 10, and the
third
being continuous values from 5 to 8. There are 2 action dimensions, the
first being discrete values from 1 to 8, and the second being continous
values from 0 to 3.141. The terminal state would be 0.5, 7, 7.
>For levels 2 and above the terminal state would need to be specified
>in the task_spec as well.
Code
Below is the current (untested) code. In this code the task_spec is
assumed to be of the form "n m", where n is the number of states and m
the number of actions.
Globals.h
- Global definitions for RL_Interface component
Env.h
- The interface to the environment that the RL_Interface expects
Agent.h
- The interface to the agent that the RL_Interface expects
The environment and agent functions required
by the RL Interface are passed (and return) actions, observations and
rewards. But the data type of actions, observations and rewards depends
on the task_spec contents. So, how can one write a method signature for
these methods that will never have to change? Because C requires that
the type be specified a priori. I've had 2 ideas on this:
1)
Make the type of actions, observations, and rewards a void*. So, in
other words, just a pointer to a space in memory. The size and contents
of this memory will depend on the task_spec. This allows the actual
size and type to be flexible, while still permanantly setting the type
in the code. The main disadvantage is that it is ugly and prone to
errors (like seg faults...).
2) Make a different version of
each method for each level of the task_spec. So, instead of
agent_step(), there would be agent_step1(), agent_step2(),
agent_step3(), agent_step4(), and agent_step5(). So, the agent would
have to export functions for every level that it supports. We could
also add a function the RL Interface spec that would allow the
interface to ask the agent which levels it supports. I haven't fully
thought this idea through, because I like the first one better...
Mark Lee
In response to Mark Lee's comments on
varying the data types in C:
1.
Making
things into generic void* is the way to go for something like this. Its
nice and flexible as Mark mentioned and easy enough to use.
Since
things have to be sent through network sockets at some point, we are
going to lose our explicit data types anyways. So, the RLInterface is
going to need to have a set of functions for extracting the size of the
data chunks that it expects to send and receive, from the Task_Spec
string. Since it needs this functionality anyways, why not include a
set of functions for parsing the void* chunk of memory?
So, I'm thinking something like:
RLInterface::setEnvTaskSpec(char* task_spec)
{
Parse task_spec for the level, number of states/actions, ranges of the
states/actions, discrete/continuousness of the states/actions and
terminal state value.
This would automatically be called when
env_init is called, the interface would simply intercept and parse the
task_spec before also passing it onto the user.
}
Then, a
series of functions like the following could be implemented for the
user's that aren't comfortable with either parsing the task_spec
themselves or dealing with the void* pointer:
//Functions for querying about the environment itself.
int RLInterface::getEnvironmentLevel();
int RLInterface::getNumStateVariables();
int RLInterface::getNumActionVariables();
bool RLInterface::getStateIsContinuous(int stateDimensionNumber);
void* RLInterface::getStateMinValue(int stateDimensionNumber);
//min/max could be combined into a struct for more clarity.
void* RLInterface::getStateMaxValue(int stateDimensionNumber);
void* RLInterface::getStateValue(void* state, int stateDimensionNumber);
bool RLInterface::getActionIsContinuous(int actionDimensionNumber);
void* RLInterface::getActionMinValue(int actionDimensionNumber);
void* RLInterface::getActionMaxValue(int actionDimensionNumber);
void* RLInterface::getActionValue(void* action, int
actionDimensionNumber);
bool RLInterface::checkIfTerminalState(void* state); //checks given
state against the stored terminal state.
The
retrieved values would still have to be cast by the user into the
appropriate type (int) or (double) depending on the discreteness of
that variable, but we are providing the functions for checking that, so
that shouldn't be a problem. Though, the following functions could
additionally be added:
int RLInterface::getDiscreteStateValue(void* state, int
stateDimensionNumber);
double RLInterface::getContinuousStateValue(void* state, int
stateDimensionNumber);
int RLInterface::getDiscreteStateMinValue(int stateDimensionNumber);
double RLInterface::getContinuousStateMinValue(int
stateDimensionNumber);
int RLInterface::getDiscreteStateMaxValue(int stateDimensionNumber);
double RLInterface::getContinuousStateMaxValue(int
stateDimensionNumber);
int RLInterface::getDiscreteActionValue(void* action, int
actionDimensionNumber);
double RLInterface::getContinuousActionValue(void* action, int
actionDimensionNumber);
int RLInterface::getDiscreteActionMinValue(int actionDimensionNumber);
double RLInterface::getContinuousActionMinValue(int
actionDimensionNumber);
int RLInterface::getDiscreteActionMaxValue(int actionDimensionNumber);
double RLInterface::getContinuousActionMaxValue(int
actionDimensionNumber);
These
would automatically cast the value from a void* to the actual data
type. They would also do a check if that state/action is actually
discrete or continuous and throw an exception if the user tries to
extract a discrete value from a continuous state dimension (and
similiar such errors) in order to prevent casting things into nonsense
values. Naturally these checks would be bypassed by the void* values so
the more advanced programmers wouldn't have to put up with the
extremely minor performance hit these checks would provide (course,
they could always parse the state and construct the action variables
themselves based on the task_spec).
/*
Function for setting
action variables in the void* memory chunk, this would take an action
variable "action" that points to a section of memory, the dimension of
this action variable that we want to change, and a pointer to the value
that we want to store in this dimension of the action. The action
variable passed in is modified to reflect this change.
*/
void RLInterface::setActionVariable(void* action, int
actionDimensionNumber, void* actionDimensionValue);
/*
This function would return a pointer to the value stored in the
requested dimension of the action.
*/
void* RLInterface::getActionVariable(void* action, int
actionDimensionNumber);
These
functions could also do things like throw exceptions if the user tries
to set values outside of the given state/action's allowable range, or
tries to access the 7th state dimension when there are only 6, stuff
like that to prevent segmentation faults and hopefully provide some
meaningfull feedback as to what the user is doing wrong.
Also,
the data type of a reward should not need to vary (I think), so just
make it always a double or a float. None of the current environment
levels say anything about what the reward data type is, so I say make
it a double unless we add in more additional environment levels.
I'd write this code if everyone thinks its a good approach to doing it.
2.
This
would work, but its a really ugly solution to the problem. I'd
recommend strongly against it. Though, there should be a function added
to the RLInterface and the agent, that is something like:
bool RLInterface::getAgentSupportsEnvironmentLevel(int level);
In case a given agent doesn't work with say non-discrete variables.
So, that all said, any flaws in my approach or should I go ahead and
code it up?
-Thomas Pittman
Just posted my not-tested SarsaAgent.cpp
code. Also changed the links
for RLInterface.c and Agent.h because I had to change agent_end to
return void.
One
assumption I'm making is that the states are integers numbered from 0
to n-1. One assumption I'm not making is that actions are anything
other than the Action type, which I'm happy enough about. Then I make
my own action array and index it from 0 to m-1, of course.
I
didn't want to make a state array because it's one thing iterating over
an action array and another entirely needing to create a structure the
size of the number of states, and looking up the *index* into it on
every step. That seems like a horrible waste of time. So should we
enforce that states are ints, or should we generalize soon?
-Anna
Just some random thought:
Terminal
condition for learning could be handled as follows. The struct
observation having 3 fields, reward, state, and terminal_flag. This
makes it easy for environment to set and interface to check. Simplifies
task_specs too because I think we can come up with counter examples
every time someone proposes a terminal value that is numeric. Also C
cant do the python trick of sometimes returning a string and sometimes
returning a number.
-Adam
Further discussions of task_spec:
We
think that agents and environments will usually be able to handle only
one kind of everything (continuous/discrete, multidimensional), and
should have functions which return what main_spec they can handle. This
will be an integer, but defined in RL_Interface in words -
DISCRETE_SPACE, etc. We'll discuss.
So main_spec is separate
from task_spec - main_spec is something that the agent and environment
can either handle or not. task_spec is the information the environment
needs given the kind it is - number of states, number of dimensions,
range of continuous values, etc.
States in main_spec level 1 (integer everything) start counting from 0.
It
will be the responsibility of the RL_Interface to cast things
appropriately based on the specs, and to ensure that the agent and
environment are okay with that spec. The agent and environment need
functions (name suggestions?) which return true/false, or which return
the task level they can handle.
It's not going to be too
complicated with supported task specs, because now we are assuming that
the agent only handles one type. This seems fair.
-Anna
Some notes on ideas for the RL interface.
These won't make all that much sense by themselves, but may give some
sense what i am thinking about on this (and i don't have time just now
to polish).
version number - this interface supports env versions X or higher,
agent versions Y or higher
calls may have an N in them to indicate expected num of args, absent
args default to...zero.
For env designers:
version number - this env has a version number
env must be able to describe the numericity of the sensations it
generates:
num_cardinal
num_interval
num_ordinal
num_ratio
env must be able to describe the ranges for each type of sensation with
non-zero numericity:
cardinal: {0,1,..,N-1}, provide N for each
interval: [min,max), provide min,max for each
...
env must be able to describe the data types for each type of sensation
with non-zero numericity:
cardinal:
env must be able to describe the numericity of the actions it expects
to recieve:
num_cardinal
num_interval
num_ordinal
num_ratio
env must be able to describe the ranges for each type of sensation with
non-zero numericity:
cardinal: {0,1,..,N-1}, provide N for each
interval: [min,max), provide min,max for each
env must be able to describe the data types for each type of sensation
with non-zero numericity:
cardinal:
For example: tabular env: generates N, expects M, one argument each
For example: mountain car:
For agent designers:
version number - this agent has a version number
agent must describe the numbers of the sensation space it accepts:
max_num_cardinal
max_num_interval
max_num_ordinal
max_num_ratio
agent must describe the envelope of the actions it generates:
num_cardinal
num_interval
num_ordinal
num_ratio
agent may be able to take info on the actions it should generate:
num_cardinal
num_interval
num_ordinal
num_ratio
(and same for their ranges)
agent may be able to take info on the sensations it should expect:
num_cardinal
num_interval
num_ordinal
num_ratio
(and same for their ranges)
Or better might be to make multiple calls, one per variable, providing
all info about it.
rich
Rich I think I understand the overall flavor
of your approach, but like you said the details aren't entirely clear.
Perhaps when you get a chance you could comment on how it differs from
the following (a marriage of several peoples input).
I think the task_spec should include a version number X and a level.
Why is the level important? Well in the interface (C) globals have to
be declared. So if the task spec level corresponds to each of what we
determine are the meaningful/possible senieros then we can declare
things as void (as in marks previous posting) then cast the types based
on the level. In this way level describes the types that we expect and
expected format of actions and sensations.
I imagine the following levels, this is for the task_spec generated by
the env only....others would easily follow.
format :: (Version_num, level, [sensation_minS], [sensation_maxS,
[action_minA], [action_maA] )
maxA - max action value
minA - min action value
maxS - max sensation value
minS - hmmmmm
level 1:
(X, 1, 0, N-1, 0, N-1)
Discrete actions and discrete sensations ... standardize numbering
starting at zero.
In this case the state can be multi-demensional, but can always be
converted into an integer.
sensations are ints
actions are ints
level 2:
(X, 2, 0, N-1, [minA::INT], [maxA::INT])
Discrete actions but multiple actions and discrete sensations ...
therefore we must give an array of ints specifying the min action value
in each dimension and a corresponding max array.
sensations are ints
actions are an array of ints
level 3:
(X, 3, 0, N-1, [minA::DOUBLE], [maxA::DOUBLE])
Continuous single or multiple actions and discrete sensations ...
Now we must pass arrays of doubles to specify the range of each
dimension of the actions
Assumption here is once your working with continuous actions whether
the agent emits a single output or multidimensional it is easier to
work with arrays of doubles instead having two separate cases for
double and [double] actions.
sensations are ints
actions are an array of doubles
level 4:
(X, 4, [minS::DOUBLE], [maxS::DOUBLE], 0, N-1 )
Continuous sensations that may be single or multi dimensional. Actions
are discrete.
Now we must pass arrays of doubles to specify the range of each
dimension of the state and we standardize action labeling from 0 to
N-1. Again same assumption with sensation here as with actions
previously. If we are working with continuous might as well work with
double arrays.
sensations an array of doubles
actions are ints
level 5:
(X, 5, [minS::DOUBLE], [maxS::DOUBLE], [minA::INT], [maA::INT])
Same as level 4 except actions are now multi-D. So we have to specify
range of each action dimension, hence arrays of ints.
sensations an array of doubles
actions an array of ints
level 6:
(X, 6, [minS::DOUBLE], [maxS::DOUBLE], [minA::DOUBLE], [maxA::DOUBLE])
Everything is continuous, sensations and actions. Maybe be single or
multidimensional.
sensations an array of doubles
actions an array of doubles
This way the level corresponds the types of the sensations and actions
without having it explicitly passed as a string "action-int" for
example.
As someone who has worked alot with continuous sensations and actions,
I think the min and max, (which cover the range, extreme values and
carnality in the discrete case) are very important. If you tile coding
you want to know the range so you can scale the sensation between
[0,1]. I think now that the density would be important. Not the number
of states per say or the distribution but a number between 0 and 1 that
specifies the sparseness of the sensations produced. This is important
for tile coding where I want to choose the number of tilings and memory
size. Maybe that's providing to much info...not sure. It might look
like this:
(X, 6, [minS::DOUBLE], [maxS::DOUBLE], [minA::DOUBLE], [maxA::DOUBLE],
[densityS::DOUBLE] )
an array specifying the densities of each dimension of the state. Of
course only apples to continuous sensations, levels 4 thru 6.
IN SUM:
I think this gives the interface the info it needs to decide if an
agent and env can talk to each other, and reduces the burden on the end
user. I think this specification, if well documented is much simpler
and faster than multiple functions. It also gives the agent lots of
info about the task. My hope/Dream is that the env and agent writers
have to only define around 4 functions and this spec. More complication
may make our lives and the interface clearer but I think that is
backwards. The interface can/will be horrible inside(no-one will look
anyway) but writing agents and evn will be fast and simple. We will
finally reach a point where we can literally write a new env and
quickly plug it into a prewritten agent as long as these specs are
handled properly.
Crap? Ok? - let me know!!!
-Adam
I've made a new
proposal for the task_spec format. Let me know what you think.
Mark
The new task spec looks good Mark. The only
thing that I would change, would be to use a 0 for discrete and a 1 for
continuous (simply for that sake of proper boolean representation),
instead of the 1 and 2 that you proposed.
-Thomas Pittman
0 and 1 are more convenient. I agree with
that change.
Adam, in response to your post on the task-spec page:
I think having 6 task spec numbers might make the code a bit uglier. I
was just thinking about how thw code would look and I think it'd be
something like:
if (level 1)
{}
elif (level 2)
{}
...
To me this new way seem maybe a bit more straightforward and clean.
Separating the levels for State and action gives many fewer levels,
because we don't have to do the product of the two (you know what I
mean?).
Some dimensions discrete and some continuous: I was just thinking about
environments like... for example maybe you're drinving a car and you
can control the gas pedal (continuous), but you also control the high
beams (discrete). If everything is assumed to be continuous then how
does that work? Does the environment say that the high beams are
continuous from 0 to 1? And what does 0.5 mean? I just think it's maybe
more flexible to allow both.
Mark
Brian pointed out that we should get
feedback from people on the front lines of RL applications, to make
sure our interface would be useful to them. Do we need replay? Do we
need an "undo"? Do we need to feed in the seed to generate exactly the
same environmental response? Do we need to account for batch updating?
How do we enforce these things? Do we need a concerted effort to take
these into account, either through communication with people running
experiments or reading their papers and see what they used?
-Anna
So I have written some code to interface
with an environment written for RLbench (assuming its compiled already
in a make file or something). It supports all three generative models
of rlBench, specified by the user through a compiler flag. It also
parses the output of rlBench to construct a task spec according to
marks last post on it.
A few assumptions I have made:
1 - Even though the task spec specifies whether the actions & state
are int, double or mixed and the number of dimensions, I have assumed
that everything (actions and states) should be declared and
passed around as vectors of doubles.
This eliminates declaring things as void* and significantly reduces
code length and complexity. An example of this complexity sheds alot of
light:
env_step method
in:
action - can be INT, [INT], double, [DOUBLE], or a mix
out :
reward - double
state - can be INT, double, [DOUBLE], or a mix
So trying to implement this is a pain and I don't think it gives added
clarity to the user. If the task spec tells them the state is ( int,
double, double, int) then thats enough. If we use vectors of doubles we
have the same type across the interface code, we can still access
elements array style (vec[3]) and vectors support .size() which is nice.
I have talked this over with Anna and we feel this reduces complexity
and doesn't limit expressiveness.
2 - the interface will be extended to support agent passing a state
and/or random seed from its step method.
3 - observation struct will be extend to support optional return
"state".
I will post the code tomorrow after I have chatted with mark about some
things. I am making a similar program to interface with RLbecnh agents.
Which now forces me to define the following:
task_spec interaction proposal The environment passes a task_spec to the interface, the agent
passes its task_spec to the interface. The interface will compare them
and determine whether the two can "Talk". If so the environments
task_spec is passed to the agents init method.
The form of the agents task spec would be the following (much like
Marks env one):
"(V:S:A)"
The first part is the version info
State(S) & Action(A) Info: Both have the same format:
level #dimensions
where level is a number specifying whether the space is continuous or
discrete, #dimensions is a number specifying the number of dimensions
in the space but its slightly different here.
if #dimensions == 0
single dimension state/action
if #dimensions == 1
multi dimension state/action
level can either be 1, 2, or 3 where the meanings of each are as
follows:
1 means that the space is discrete
2 means that the space is continuous
3 means that the space is partially continuous
example
"(1:2_1:2_0)"
means version number 1, continuous multi-dimensional state space, and
continuous single dimensional action
If everyone is ok with the idea of 2 task_specs and the interface
comparing them AND the agent task_spec I have proposed then I will go
ahead with the RLbench agent interfacing code.
Next would be doing the same thing for CSLL.
-Adam
i think i have a reasonable way to handle
less-standard performance measures such as the expected-reward measure
yaki brought up. we do it through side-calls directly to the
environment. so, you write your benchmark, calling rl_step, and
on each step you make an additional call to the environment to get the
additional performance measure. (this could be anything you
wanted and could have any name, but maybe something like
env_expected_reward() would be appropriate for yaki's case.) this
strategy would allow complete generality in additional performance
measures. And i don't think we should view this as too much of a
climb down from a pure interface. i think in practice people will
want to do lots of things in their main and benchmark programs -- for
example graphical displays of various env and agent variables -- that
are idiosyncratic to their particular purposes.
rich
I was thinking about State and Action as
defined in Globals.h. Rich suggested that we declare them as integers
for the public release in order to present the simple case. However, we
(Adam and I) have been more inclined to declare them as double arrays
to present the general case (because an array of doubles can store an
integer, a double, an array of integers, or an array of doubles). The
problem with an array of doubles when the actual type is something like
an integer is that both the env and agent must do some ugly casting on
each step.
I've thought of an alternative that might be both simple and general.
Using a union. So, State would be declared as:
typedef union {
int i;
double d;
int* ai;
double* ad;
} State;
in Globals.h. Then, lets say that the state is a single integer. The
env code would look like this:
So, in theory this method is fairly simple to use (no ugly casting
needed) and really general (Globals.h should not need to be changed
except for really bizaar state types (like linked lists)). Does this
sound ok to everyone?
Mark
We should plan on the type definitions for
sensation and action to be made specially for the individual envs. This
is what will happen almost always. We should plan on and for it. We
should provide a few examples where the type definitions are different
for different env-agent combinations. It would be nice if we could have
two definitions in play in the same executable, as when running several
agent-env combinations. Inside the interface, generic type definitions
should be used. In the agent and environment, more specific definitions
should be used.
1) Make the type of actions, observations, and rewards a void*. So, in other words, just a pointer to a space in memory. The size and contents of this memory will depend on the task_spec. This allows the actual size and type to be flexible, while still permanantly setting the type in the code. The main disadvantage is that it is ugly and prone to errors (like seg faults...).
2) Make a different version of each method for each level of the task_spec. So, instead of agent_step(), there would be agent_step1(), agent_step2(), agent_step3(), agent_step4(), and agent_step5(). So, the agent would have to export functions for every level that it supports. We could also add a function the RL Interface spec that would allow the interface to ask the agent which levels it supports. I haven't fully thought this idea through, because I like the first one better...
Mark Lee