RLAI - Option Discovery

Using Relative Novelty to Identify Useful Temporal Abstractions in Reinforcement Learning
(Ozgur Simsek, Andrew Barto) ICML 2004

Dynamic Abstraction in Reinforcement Learning via Clustering
(Shie Mannor, Ishai Menache, Amit Hoze, Uri Klein) ICML 2004

Effective Online Detection of Task-Independent Landmarks
(Martin V. Butz, Samarth Swarup, David E. Goldberg) PRWK workshop at ICML 2004

Learning Options in Reinforcement Learning
(Martin Stolle, Doina Precup) '02

Hierarchical Reinforcement Learning Based on Subgoal Discovery
(Bram Bakker, Jurgen Schmidhuber) NCI 2004

Automatic Discovery of Subgoals in Reinforcement Learning
(Amy McGovern, Andrew G. Barto) '01

acQuire-macros: An Algorithm for Automatically Learning Macro-actions
(Amy McGovern) '98

Autonomous Discovery of Temporal Abstractions From Interaction with an Environment
(Amy McGovern) PhD Dissertation '02

Identifying Useful Subgoals in Reinforcement Learning by Local Graph Partitioning
(Ozgur Simsek, Alicia P. Wolfe, Andrew G. Barto) '04

Improving Automatic Discovery of Subgoals for Options in Hierarchical Reinforcement Learning
(R. Matthew Kretchmar, Todd Feil, Rohit Bansal) Journal of Computer Science and Technology, October '03
The link to this paper seems to be down right now and I can't find an alternative link. Basically, they build on the work of McGovern and Barto '01. A new algorithm (the FD algorithm) is introduced which uses both visit frequency and temporal distance to determine subgoals.

Subgoal Discovery for Hierarchical Reinforcement Learning Using Learned Policies
(Sandeep Kumar Goel) MsC Dissertation '03
Subgoal Discovery for Hierarchical Reinforcement Learning Using Learned Policies
(Sandeep Goel, Manfred Huber) '03
The second link is a concise version of their approach to subgoal discovery.

I. Menache, S. Mannor and N. Shimkin, "Q-Cut - dynamic discovery of sub-goals in reinforcement learning," ECML'02 - European Conference on Machine Learning, Helsinki, August 2002. In: T. Elomma et al. (eds.), Lect. Notes Artific. Intel. 2430, Springer, pp. 295-306.

M. Pickett and A. G. Barto. PolicyBlocks: An Algorithm for Creating Useful Macro-Actions in Reinforcement Learning. In the proceedings of the 19th International Conference on Machine Learning (2002).

S. Thrun and A. Schwartz. Finding structure in reinforcement learning. In G. Tesauro, D. Touretzky, and T. Leen, editors, Advances in Neural Information Processing Systems (NIPS) 7, Cambridge, MA, 1995. MIT Press.

Iba, G. A. (1989). A heuristic approach to the discovery of macro-operators. Machine Learning, 3, 285-317.
Cited by many of the option discovery papers. This comes out of the search literature, but the idea can be applied to the problem of option discovery. States that have a higher heuristic value than adjacent states is declared a 'peak'. A macro-operator is created to join peaks along the trajectory.

Digney, B. (1998). Learning hierarchical control structure for multiple tasks and changing environments. In From Animals to Animats 5: The Fifth Conference on the Simulation of Adaptive Behavior. The MIT Press.

Is there research into the connection between macros in search and options? Or are they completely unrelated? Does the temporal nature of options mean it is a completely different beast than anything in search? Anna, Fri Oct 22 14:50:39 2004

Answer: One way of thinking about the relationship is that options are the generalization of macro-operators to the stochastic case. The deterministic case is quite different because your temporally extended actions can be "open loop", meaning that they can ignore what is happening, because what happens is always the same. Rich Sutton, Tue Nov 9 18:15:02 2004

	Reinforcement Learning and Artificial Intelligence (RLAI)
	Option Discovery