Font Size: a A A

Concurrent hierarchical reinforcement learning

Posted on:2007-03-12Degree:Ph.DType:Dissertation
University:University of California, BerkeleyCandidate:Marthi, Bhaskara MannarFull Text:PDF
GTID:1448390005968680Subject:Artificial Intelligence
Abstract/Summary:
This dissertation investigates the use of hierarchy and concurrency in solving sequential decision problems that require several complex activities to be done in parallel. The approach is based on constraining an agent's behaviour using a multithreaded partial program.; The first major contribution is a language, Concurrent ALisp, for writing partial programs. Unlike previous such languages, Concurrent ALisp is designed for problems involving the control of multiple effectors, each of which may be engaged in performing a different task. The language allows the specification of multiple threads of control, which automatically coordinate their decisions in order to guarantee jointly optimal behaviour. A formal semantics is provided, that shows that the problem of finding the optimal completion of a Concurrent ALisp program is equivalent to that of finding an optimal stationary policy in a related Semi-Markov Decision Process.; The Concurrent ALisp language has been implemented in Lisp as part of an overall system for hierarchical reinforcement learning. This system, apart from serving as a reference implementation of the language, contains several learning and planning algorithms, and has been designed to be modular and extensible, so that new environments, algorithms for planning and learning, and representations for policies, value functions, and Q-functions, can all be added very easily. Since it is probably the first system with these features, its architecture is described.; Next, particular learning methods for Concurrent ALisp programs are investigated. First, a version of a standard algorithm known as Q-learning is considered, and shown to converge to the optimal completion. Next, methods are considered for making better use of the structure inherent in the domain. It is shown that, given a threadwise decomposition of the reward function, the learning process itself can be decomposed across threads, while maintaining global optimality. Furthermore, within each thread, the learning process can be further decomposed based on the task structure of the thread. The value of these decompositions is that, often, each piece to be learnt need only pay attention to a small number of state variables, thus significantly reducing the number of parameters to be learnt.; The methods are then evaluated empirically on a large computer game domain that is extremely challenging for existing algorithms. Several techniques are used to achieve a practical algorithm on this large domain, including coordination graphs for efficient joint decision-making, relational linear function approximation to reduce the number of parameters to learn, and reward-shaping to speed learning.; Together, the methods presented in this work increase the set of domains that may be feasibly handled by reinforcement learning methods. They allow the specification of prior knowledge, have strong formal guarantees; and take advantage of both the threadwise and temporal structure of the problem.
Keywords/Search Tags:Concurrent, Reinforcement
Related items