RLAI Reinforcement Learning and Artificial Intelligence (RLAI)
Reinforcement learning interface documentation version 6

The ambition of this web page is to describe a new proposed standard for the interface between reinforcement learning agents and their environments. We describe the inputs and outputs of the functions defining the agent, the environment, and an optional simulation object.  A sketch of the implementation is also given for the provided interface routines.

What an environment must provide

    env_init(env) --> env_spec
    env_start(env) --> sensation
    env_step(env, next_action) --> reward, sensation, next_action

What an agent must provide

    agent_init(agent, env_spec) -->
    agent_start(agent, sensation) -- next_action
    agent_step(agent, reward, sensation) --> action

What the interface provides

Lowest-level interface (agent, env, and next_action managed separately)

init(env, agent) -->
init_agent(agent, init_env(env))

start(env, agent) --> sensation, next_action
s = env_start(env)
a = agent_start(agent, s)
return s, a

step(env, agent, next_action) --> reward, sensation, next_action
r, s = env_step(env, next_action)
a = agent_step(agent, r, s)
return r, s, a

steps(env, agent, next_action, num_steps)
        --> [reward, sensation, action, ..., next_action]


episode(env, agent) --> [sensation, action, reward, ..., r_T]
s, a = start(agent, env)
list = [s, a]
while s != terminal:
r, s, a = step(env, agent, a)
list = list + [r, s, a]
return list minus last two elements

episodes(env, agent, num_episodes) --> [episode_1, ..., episode_num_episodes]

Higher-level interface (agent, env, and next_action bundled together as a sim)

Sim(env, agent) --> sim
init_agent(agent, init_env(env))
return [agent, env, Null]

env(sim) --> env
agent(sim) --> agent
next_action(sim) --> next_action
next_action(sim) = a

sim_start(sim) --> sensation, action

s = env_start(env(sim))
a = agent_start(agent(sim), s)
next_action(sim) = a
return s, a

sim_step(sim) --> reward, state, action
r, s = env_step(env(sim), next_action(sim))
a = agent_step(agent(sim), r, s)
next_action(sim) = a
return r, s, a

sim_steps(sim,num_steps) --> [reward, state, action, reward, ..., action]

sim_episode(sim) --> [s_0, a_0, r_1, s_1, ..., r_T]
s, a = start(agent(sim), env(sim))
list = [s, a]
while s != terminal:
r, s, a = sim_step(sim)
list = list + [r, s, a]
return list minus last two elements

sim_episodes(sim,num_episodes) --> [episode_1, ..., episode_num_episodes]




Extend this Page   How to edit   Style   Subscribe   Notify   Suggest   Help   This open web page hosted at the University of Alberta.   Terms of use  2077/1