Using Relative
Novelty to Identify Useful Temporal Abstractions in Reinforcement
Learning
(Ozgur Simsek, Andrew Barto) ICML 2004
Dynamic Abstraction in Reinforcement Learning
via Clustering
(Shie Mannor, Ishai Menache, Amit Hoze, Uri Klein) ICML 2004
Effective
Online Detection of Task-Independent Landmarks
(Martin V. Butz, Samarth Swarup, David E. Goldberg) PRWK workshop at
ICML 2004
Learning Options in
Reinforcement Learning
(Martin Stolle, Doina Precup) '02
Hierarchical
Reinforcement Learning Based on Subgoal Discovery
(Bram Bakker, Jurgen Schmidhuber) NCI 2004
Automatic Discovery of Subgoals in
Reinforcement Learning
(Amy McGovern, Andrew G. Barto) '01
acQuire-macros: An Algorithm for Automatically Learning Macro-actions
(Amy McGovern) '98
Autonomous Discovery of Temporal Abstractions From Interaction with an
Environment
(Amy McGovern) PhD Dissertation '02
Identifying
Useful Subgoals in Reinforcement Learning by Local Graph Partitioning
(Ozgur Simsek, Alicia P. Wolfe, Andrew G. Barto) '04
Improving Automatic Discovery of Subgoals for Options in Hierarchical
Reinforcement Learning
(R. Matthew Kretchmar, Todd Feil, Rohit Bansal) Journal of Computer
Science and Technology, October '03
The link to this paper seems to be down right now and I can't find an
alternative link. Basically, they build on the work of McGovern
and Barto '01. A new algorithm (the FD algorithm) is introduced
which uses both visit frequency and temporal distance to determine
subgoals.
Subgoal
Discovery for Hierarchical Reinforcement Learning Using Learned Policies
(Sandeep Kumar Goel) MsC Dissertation '03
Subgoal
Discovery for Hierarchical Reinforcement Learning Using Learned Policies
(Sandeep Goel, Manfred Huber) '03
The second link is a concise version of their approach to subgoal
discovery.
I. Menache, S. Mannor and N. Shimkin, "
Q-Cut
- dynamic discovery of sub-goals in reinforcement learning,"
ECML'02 - European Conference on Machine Learning, Helsinki, August
2002. In: T. Elomma et al. (eds.), Lect. Notes Artific. Intel. 2430,
Springer, pp. 295-306.
M. Pickett and A. G. Barto.
PolicyBlocks:
An Algorithm for Creating Useful Macro-Actions in Reinforcement Learning.
In the proceedings of the 19th International Conference on
Machine Learning (2002).
S. Thrun and A. Schwartz.
Finding
structure in reinforcement learning. In G. Tesauro, D. Touretzky,
and T. Leen, editors, Advances in Neural Information Processing Systems
(NIPS) 7, Cambridge, MA, 1995. MIT Press.
Iba, G. A. (1989). A heuristic approach to the discovery of
macro-operators. Machine Learning, 3, 285-317.
Cited by many of the option discovery papers. This comes out of
the search literature, but the idea can be applied to the problem of
option discovery. States that have a higher heuristic value than
adjacent states is declared a 'peak'. A macro-operator is created
to join peaks along the trajectory.
Digney, B. (1998).
Learning
hierarchical control structure for multiple tasks and changing
environments. In From Animals to Animats 5: The Fifth Conference on
the Simulation of Adaptive Behavior. The MIT Press.
Cosmin's CMPUT 651 project: