Font Size: a A A

Model learning and application of partially observable Markov decision processes

Posted on:2009-11-10Degree:Ph.DType:Dissertation
University:Duke UniversityCandidate:He, LihanFull Text:PDF
GTID:1448390002992518Subject:Engineering
Abstract/Summary:
The partially observable Markov decision process (POMDP) has been widely used in robot navigation and decision making. Learning an accurate POMDP model is a prerequisite for model-based POMDP applications. Given the definitions of states, actions and observations, learning a POMDP model concerns inferring the state-transition probabilities and state-dependent observation probabilities. This dissertation presents three Bayesian methods for learning a POMDP model, based on the MEDUSA (Markovian exploration with decision based on the use of sampled models algorithm) learning, multi-task learning (MTL) and life-long learning (LLL). These learning algorithms are introduced within two POMDP applications: adaptive land- mine sensing and online target searching.;After presenting background material on POMDPs, MEDUSA, MTL and LLL in Chapter 1, Chapter 2 addresses the application of multimodality sensing of landmines using two sensors. We first assume adequate data for learning an underlying POMDP model of mines and clutter are available, and describe the method of building an appropriate model. This is generalized by assuming the training data for mines and clutter are not available a priori, and the underling POMDP model is learned online based on a modified MEDUSA approach. An oracle is employed adaptively to reveal the label information of the underground mines/clutter under interrogation, and the posterior of the underlying POMDP model is updated based on the interrogation result. Example results are presented using measured sensing data from two sensors for buried mines and clutter, to demonstrate the performance of the algorithm.;Chapters 3-5 address the application of online target searching in an unknown environment. POMDPs and a simultaneous localization and mapping (SLAM) algorithm are combined in this application to navigate a robot (searching for an acoustic source) arid to build a global map simultaneously. Chapter 3 introduces the SLAM algorithm, and proposes a geometric map representation by which a map is represented by a set of geometric units. Chapter 4 presents the online target searching framework, based on a modified MEDUSA and a grid-based SLAM, under the assumption of the availability of all the possible subworlds that may be encountered. An accurate POMDP model for each possible subworld is built before searching. The modified MEDUSA is performed for each subworld during the searching process, to recognize a correct underlying model. The assumption of knowing all the possible subworlds a priori is removed in Chapter 5, where two transfer learning approaches, multi-task learning and life-long learning, are proposed for learning a POMDP model, when the training data from a single task are not sufficient. The matrix stick-breaking process prior employed in the algorithms provides a flexible sharing structure, allowing two learning tasks to share only a subset of states with associated state transition probabilities and observation probabilities, instead of sharing the entire POMDP model. The simulation results for some simulated environments and for a real environment show the effectiveness of the framework and the algorithms.
Keywords/Search Tags:POMDP, Decision, Process, Modified MEDUSA, Application, Online target searching, Algorithm
Related items