RLAI Reinforcement Learning and Artificial Intelligence (RLAI)
Option Discovery
 
Author: Anna October, 2004
The ambition of this page is to provide a hub for research into option discovery, including links to relevant papers and discussions relating to those papers and to option discovery in general.


Papers on options:
Between MDPs and Semi-MDPs: A Framework for Temporal Abstraction in Reinforcement Learning
(Richard Sutton, Doina Precup, Satinder Singh) '99
    - I (anna) think this is considered the definitive paper on options.

Recent Advances in Hierarchical Reinforcement Learning
(Andrew G. Barto, Sridhar Mahadevan) '03
    - Cosmin recommended this as an overview.

Papers relating to option discovery:
Using Relative Novelty to Identify Useful Temporal Abstractions in Reinforcement Learning
(Ozgur Simsek, Andrew Barto) ICML 2004

Dynamic Abstraction in Reinforcement Learning via Clustering
(Shie Mannor, Ishai Menache, Amit Hoze, Uri Klein) ICML 2004

Effective Online Detection of Task-Independent Landmarks
(Martin V. Butz, Samarth Swarup, David E. Goldberg) PRWK workshop at ICML 2004

Learning Options in Reinforcement Learning
(Martin Stolle, Doina Precup) '02

Hierarchical Reinforcement Learning Based on Subgoal Discovery
(Bram Bakker, Jurgen Schmidhuber) NCI 2004

Automatic Discovery of Subgoals in Reinforcement Learning
(Amy McGovern, Andrew G. Barto) '01

acQuire-macros: An Algorithm for Automatically Learning Macro-actions

(Amy McGovern) '98

Autonomous Discovery of Temporal Abstractions From Interaction with an Environment

(Amy McGovern) PhD Dissertation '02

Identifying Useful Subgoals in Reinforcement Learning by Local Graph Partitioning
(Ozgur Simsek, Alicia P. Wolfe, Andrew G. Barto) '04

Improving Automatic Discovery of Subgoals for Options in Hierarchical Reinforcement Learning
(R. Matthew Kretchmar, Todd Feil, Rohit Bansal) Journal of Computer Science and Technology, October '03
The link to this paper seems to be down right now and I can't find an alternative link.  Basically, they build on the work of McGovern and Barto '01.  A new algorithm (the FD algorithm) is introduced which uses both visit frequency and temporal distance to determine subgoals.

Subgoal Discovery for Hierarchical Reinforcement Learning Using Learned Policies
(Sandeep Kumar Goel) MsC Dissertation '03
Subgoal Discovery for Hierarchical Reinforcement Learning Using Learned Policies
(Sandeep Goel, Manfred Huber) '03
The second link is a concise version of their approach to subgoal discovery.

I. Menache, S. Mannor and N. Shimkin, "Q-Cut  - dynamic discovery of sub-goals in reinforcement learning," ECML'02 - European Conference on Machine Learning, Helsinki, August 2002. In: T. Elomma et al. (eds.), Lect. Notes Artific. Intel. 2430, Springer,   pp. 295-306.

M. Pickett and A. G. Barto.  PolicyBlocks: An Algorithm for Creating Useful Macro-Actions in Reinforcement Learning. In the proceedings of the 19th International Conference on  Machine Learning (2002).

S. Thrun and A. Schwartz. Finding structure in reinforcement learning. In G. Tesauro, D. Touretzky, and T. Leen, editors, Advances in Neural Information Processing Systems (NIPS) 7, Cambridge, MA, 1995. MIT Press.

Iba, G. A. (1989). A heuristic approach to the discovery of macro-operators. Machine Learning, 3, 285-317.
Cited by many of the option discovery papers.  This comes out of the search literature, but the idea can be applied to the problem of option discovery.  States that have a higher heuristic value than adjacent states is declared a 'peak'.  A macro-operator is created to join peaks along the trajectory.

Digney, B. (1998). Learning hierarchical control structure for multiple tasks and changing environments. In From Animals to Animats 5: The Fifth Conference on the Simulation of Adaptive Behavior.  The MIT Press.

Cosmin's CMPUT 651 project:
Here's a link to my CMPUT 651 project about supervised learning of option policies.: Integrating Teaching and Learning in the Options Framework

Eddie's Summaries on Subgoal Discovery in Reinforcement Learning:
The first report can be found here: State Visitation-Based Subgoal Discovery Algorithms
The second reptort can be found here: Approaches to Subgoal Discovery in Reinforcement Learning



Is there research into the connection between macros in search and options? Or are they completely unrelated? Does the temporal nature of options mean it is a completely different beast than anything in search?

Answer: One way of thinking about the relationship is that options are the generalization of macro-operators to the stochastic case. The deterministic case is quite different because your temporally extended actions can be "open loop", meaning that they can ignore what is happening, because what happens is always the same.  

Extend this Page   How to edit   Style   Subscribe   Notify   Suggest   Help   This open web page hosted at the University of Alberta.   Terms of use  4840/7