Hierarchical learning and planning in partially observable Markov decision processes

Posted on:2003-09-13

Degree:Ph.D

Type:Dissertation

University:Michigan State University

Candidate:Theocharous, Georgios N

Full Text:PDF

GTID:1468390011980797

Subject:Computer Science

Abstract/Summary:

Sequential decision making under uncertainty is a fundamental problem in artificial intelligence. This problem is faced by autonomous agents embedded in complex environments (e.g., physical systems, including robots, softbots, and automated manufacturing systems) that need to choose actions to achieve long-term goals efficiently. Incomplete knowledge about the state of the environment, termed as hidden state, and the uncertain effects of their actions, makes sequential decision making difficult.; Partially Observable Markov Decision Processes (POMDPs) are a general framework for sequential decision making in environments where states are hidden and actions are stochastic. A POMDP model represents the dynamics of the environment, such as the probabilistic outcomes of the actions, and the probabilistic relationships between the agents observations and the hidden states. Unfortunately, “learning” the dynamics of the environment, and planning, scales poorly as the application domains get larger.; In this dissertation, we propose and investigate a Hierarchical POMDP (HPOMDP) model to scale learning and planning to large scale partially observable environments. In scaling planning, the key ideas are spatial and temporal the agent uses longer-duration actions instead of just primitive one-step actions. Macro-actions are crucial, in that they produce long traces of experience, which help the agent to disambiguate perceptually aliased situations. In scaling learning, the main advantage of hierarchy is that a multi-resolution problem representation has advantages in ease of maintenance, interpretability, and reusability. We investigate the new HPOMDP framework within the context of indoor robot navigation.; This dissertation makes the following principal contributions. We formally introduce and describe the HPOMDP model. We derive a hierarchical EM algorithm for learning the parameters of a given HPOMDP, and show empirical convergence. We introduce two approximate training methods, “reuse-training”, and “selective-training”, which result in fast training and better learned models. We also derive two planning algorithms for approximating the optimal policy for a given HPOMDP. We conduct a detailed experimental study of the learning algorithms for indoor robot navigation, and show how the robot localization improves with our learning methods. We also apply the planning algorithms to indoor robot navigation, both in simulation and in the real world. The results show that the planning algorithms are very successful in taking the robot to any environment state starting from no positional knowledge, and use significantly less number of steps than flat approaches.

Keywords/Search Tags:

Decision, Planning, Partially observable, Indoor robot navigation, HPOMDP, Hierarchical, Environment

Related items

1	Markov Theory Based Planning And Sensing Under Uncertainty
2	Research On Path Planning Based On Markov Decision Processes For Palletizing Robot
3	Research On Optimization Of Service Composition Based On Partially Observable Environment
4	Research On Path Planning Based On Markov Decision Process For AUV
5	Increasing scalability in algorithms for centralized and decentralized partially observable Markov decision processes: Efficient decision-making and coordination in uncertain environments
6	Learning partially observable Markov decision processes using abstract actions
7	Heuristic Learning Model Based On Partially Observable Markov Decision Process
8	Decision-Theoretic Planning For Multi-Agent Systems
9	Dynamic power management for computing systems in partially observable environment
10	Automating inhabitant interactions in home and workplace environments through data-driven generation of hierarchical partially-observable Markov decision processes