Model learning and application of partially observable Markov decision processes

Posted on:2009-11-10

Degree:Ph.D

Type:Dissertation

University:Duke University

Candidate:He, Lihan

Full Text:PDF

GTID:1448390002992518

Subject:Engineering

Abstract/Summary:

The partially observable Markov decision process (POMDP) has been widely used in robot navigation and decision making. Learning an accurate POMDP model is a prerequisite for model-based POMDP applications. Given the definitions of states, actions and observations, learning a POMDP model concerns inferring the state-transition probabilities and state-dependent observation probabilities. This dissertation presents three Bayesian methods for learning a POMDP model, based on the MEDUSA (Markovian exploration with decision based on the use of sampled models algorithm) learning, multi-task learning (MTL) and life-long learning (LLL). These learning algorithms are introduced within two POMDP applications: adaptive land- mine sensing and online target searching.;After presenting background material on POMDPs, MEDUSA, MTL and LLL in Chapter 1, Chapter 2 addresses the application of multimodality sensing of landmines using two sensors. We first assume adequate data for learning an underlying POMDP model of mines and clutter are available, and describe the method of building an appropriate model. This is generalized by assuming the training data for mines and clutter are not available a priori, and the underling POMDP model is learned online based on a modified MEDUSA approach. An oracle is employed adaptively to reveal the label information of the underground mines/clutter under interrogation, and the posterior of the underlying POMDP model is updated based on the interrogation result. Example results are presented using measured sensing data from two sensors for buried mines and clutter, to demonstrate the performance of the algorithm.;Chapters 3-5 address the application of online target searching in an unknown environment. POMDPs and a simultaneous localization and mapping (SLAM) algorithm are combined in this application to navigate a robot (searching for an acoustic source) arid to build a global map simultaneously. Chapter 3 introduces the SLAM algorithm, and proposes a geometric map representation by which a map is represented by a set of geometric units. Chapter 4 presents the online target searching framework, based on a modified MEDUSA and a grid-based SLAM, under the assumption of the availability of all the possible subworlds that may be encountered. An accurate POMDP model for each possible subworld is built before searching. The modified MEDUSA is performed for each subworld during the searching process, to recognize a correct underlying model. The assumption of knowing all the possible subworlds a priori is removed in Chapter 5, where two transfer learning approaches, multi-task learning and life-long learning, are proposed for learning a POMDP model, when the training data from a single task are not sufficient. The matrix stick-breaking process prior employed in the algorithms provides a flexible sharing structure, allowing two learning tasks to share only a subset of states with associated state transition probabilities and observation probabilities, instead of sharing the entire POMDP model. The simulation results for some simulated environments and for a real environment show the effectiveness of the framework and the algorithms.

Keywords/Search Tags:

POMDP, Decision, Process, Modified MEDUSA, Application, Online target searching, Algorithm

Related items

1	Online Algorithms For Searching Targets In An Unknown Region
2	Improving dynamic decision making through RFID: A partially observable Markov decision process (POMDP) for RFID-enhanced warehouse search operations
3	Research On First-order Decision-theoretic Planning In Relational Uncertain Environments
4	Research On Approximate Algorithms Of POMDP And Application To TCM Therapy Planning
5	The Research And Design Of Point-based POMDP Value Iteration Algorithm
6	The Design And Implementation Of Point-based POMDP Policy Iteration Algorithm
7	Design And Implementation Of A Stellar Auto-Searching Algorithm In The CCD Image
8	Research And Application On The Modified Automated Approach For Structural Test Data Generation Based-On MGA
9	Research On Interference Coordination Technology Based On Decision Theory In OFDMA System
10	Hardware Implementation And Application Of Reinforcement Learning Algorithm For Online Decision