RLAI open web page

The immediate ambition of this page is to list RLAI research projects as they were envisioned at UofA in Fall 2003 and Jan 2005. Its long-term ambition is to be refactored into other pages or provide a readable page listing active projects.

Feb 2003

The Grounded World Modeling Project (GWMP). See the announcement of an ICML Workshop to get the general idea. This is likely to include work on

options
Predictive State Representations
The bit2bit problem

This stuff can be broken down into two challenges:

Representing a broad range of world knowledge in a predictive, sensori-motor form, so that conventional ontologies and knowledge representation can be avoided.
Making RL work without assuming state is available, i.e., without the Markov assumption.

Off-policy learning with function approximation. It has long been known that it is difficult to find sound RL algorithms with all three of these desireable characteristics:

Off-policy - the ability to learn about one policy while following another, just as Q-learning can learn the optimal policy while behaving randomly. This characteristic is critical to being able to learn a variety of things at once. You can only follow one policy, but you'd like to learn about many, in parallel.
Bootstrapping - estimating of some quantity (typically a prediction) based on an existing estimate. "Learning a guess from a guess," as in temporal-difference learning. Bootstrapping seems essential for off-policy learning, and in practice it seems essential in order for on-policy learning to be efficient.
Function approximation - able to generalize, in at least a linear sense, from observed states or state-action pairs to unobserved ones.

For example, many examples tasks are known in which Q-learning with linear function approximation will diverge to infinity with time. Several ideas have been proposed for solving this problem:

Use an averaging function approximator. Is this practical? Can we get good FA with an averager? Or is the additional generality of the full linear case needed.
Use a second order method, such as LSTD(lambda). Some have suggested that such methods somehow avoid the problem. If so, probably the key ideas that enable this can be extended to the conventional first-order setting.
Use importance sampling, as in Precup, Sutton & Dasgupta.
2 other ideas from rich's notes

Generalization Sculpting. What can we do to make function approximation work better. How can we learn the right biases in a life-long learning context? Online cross validation. Adaptive step sizes.

Robots learning from interacting with people.

Policy gradient methods

Stopping mu.

Turnpike-horizon idea.

See also Rich's research summary (NSERC proposal).

	Reinforcement Learning and Artificial Intelligence (RLAI)
	Some RLAI research projects under consideration at the University of Alberta