Font Size: a A A

Hierarchical learning and planning in partially observable Markov decision processes

Posted on:2003-09-13Degree:Ph.DType:Dissertation
University:Michigan State UniversityCandidate:Theocharous, Georgios NFull Text:PDF
GTID:1468390011980797Subject:Computer Science
Abstract/Summary:
Sequential decision making under uncertainty is a fundamental problem in artificial intelligence. This problem is faced by autonomous agents embedded in complex environments (e.g., physical systems, including robots, softbots, and automated manufacturing systems) that need to choose actions to achieve long-term goals efficiently. Incomplete knowledge about the state of the environment, termed as hidden state, and the uncertain effects of their actions, makes sequential decision making difficult.; Partially Observable Markov Decision Processes (POMDPs) are a general framework for sequential decision making in environments where states are hidden and actions are stochastic. A POMDP model represents the dynamics of the environment, such as the probabilistic outcomes of the actions, and the probabilistic relationships between the agents observations and the hidden states. Unfortunately, “learning” the dynamics of the environment, and planning, scales poorly as the application domains get larger.; In this dissertation, we propose and investigate a Hierarchical POMDP (HPOMDP) model to scale learning and planning to large scale partially observable environments. In scaling planning, the key ideas are spatial and temporal the agent uses longer-duration actions instead of just primitive one-step actions. Macro-actions are crucial, in that they produce long traces of experience, which help the agent to disambiguate perceptually aliased situations. In scaling learning, the main advantage of hierarchy is that a multi-resolution problem representation has advantages in ease of maintenance, interpretability, and reusability. We investigate the new HPOMDP framework within the context of indoor robot navigation.; This dissertation makes the following principal contributions. We formally introduce and describe the HPOMDP model. We derive a hierarchical EM algorithm for learning the parameters of a given HPOMDP, and show empirical convergence. We introduce two approximate training methods, “reuse-training”, and “selective-training”, which result in fast training and better learned models. We also derive two planning algorithms for approximating the optimal policy for a given HPOMDP. We conduct a detailed experimental study of the learning algorithms for indoor robot navigation, and show how the robot localization improves with our learning methods. We also apply the planning algorithms to indoor robot navigation, both in simulation and in the real world. The results show that the planning algorithms are very successful in taking the robot to any environment state starting from no positional knowledge, and use significantly less number of steps than flat approaches.
Keywords/Search Tags:Decision, Planning, Partially observable, Indoor robot navigation, HPOMDP, Hierarchical, Environment
Related items