Markov Theory Based Planning And Sensing Under Uncertainty

Posted on:2015-01-06

Degree:Doctor

Type:Dissertation

Country:China

Candidate:A J Bai

Full Text:PDF

GTID:1228330467974877

Subject:Computer application technology

Abstract/Summary:

PDF Full Text Request

In the research of Artificial Intelligence, agent-based paradigm aims to provide a unifying framework for conceptualizing, designing, and implementing intelligent sys-tems, that sense, act and learn autonomously in dynamic and/or stochastic environments, to solve a growing number of complex problems. Agents, particularly various kinds of robots, are playing more and more important roles in world economics and people’s everyday life, from satellites to smartphones. Generally speaking, perceptional inputs from sensors have inevitable noises and errors. The effects of actuators have also un-predictable impact with noises, or even failures. There may also exist different levels of hidden information that can not be observed directly. Such uncertainties have brought huge challenges to the problem of agent planning and sensing.Markov decision processes (MDPs) and partially observable Markov decision pro-cesses (POMDPs) provide important basis in terms of theory and algorithm to optimal planning and sensing under uncertainty. However, solving large MDPs and POMDPs exactly is usually intractable due to the "curse of dimensionality"-the state space grows exponentially with the number of state variables. To address this challenge in practice, researchers are usually utilizing approximation techniques such as online planning, hi-erarchical planning, Monte-Carlo simulation, particle filtering, etc. Following the the-ories of MDP and POMDP, this thesis is focusing on developing efficient approximate algorithms for large MDPs and POMDPs. Specifically, we propose a MAXQ hierarchi-cal decomposition based online planning algorithm-MAXQ-OP; we develop DNG-MCTS and D2NG-POMCP algorithms which apply the idea of Thompson sampling to Monte-Carlo planning in MDPs and POMDPs; and, we develop a particle filtering over sets (PFS) approach to multi-human tracking problem.The proposed hierarchical online planning algorithm, namely MAXQ-OP, is a novel algorithm that combines the advantages of both online planing and hierarchi-cal planning. It provides a more sophisticated solution for programming autonomous agents in large stochastic domains. Specifically, we perform online decision-making by following MAXQ value function decomposition. We empirically evaluate our algo-rithm on the Taxi problem-a common benchmark for MDPs. The experimental results show that MAXQ-OP is able to find near optimal policy online, with extremely less computation time comparing to traditional online planning algorithms. The RoboCup soccer simulation2D domain is a very large test-bed for the research of artificial intelli-gence. The key challenge lies in the fact that it is a fully distributed, multi-agent stochas-tic system with continuous state, action and observation spaces. We have conducted a long-term case study in RoboCup2D domain and developed a team named WrightEagle that have won multiple world champions and national champions in RoboCup annual competitions. The results of our case study confirm MAXQ-OP’s important potential of scaling up to very large domains.Monte-Carlo tree search (MCTS) has been drawing great interest recently in do-mains of planning and learning under uncertainty. One of the key challenges is the trade-off between exploration and exploitation. We develop novel approaches, namely DNG-MCTS and D2NG-POMCP, to MCTS by using posterior action sampling to select actions for online planning in MDPs and POMDPs. Specifically, we treat the cumula-tive reward obtained by taking an action in a search node and following a tree policy thereafter over the Monte-Carlo search tree as a random variable following an unknown distribution. We parametrize the distribution by introducing necessary hidden param-eters, and infer the posterior in Bayesian settings. Thompson sampling is then used to exploit and explore the search tree, by selecting an action based on its posterior probabil-ity of being optimal. Experimental results confirm that the proposed algorithms outper-form the state-of-the-art approaches with better values on several benchmark problems, showing the potential of successfully applying to very large real-world problems.The ability for an autonomous robot to detect, track and identify potentially mul-tiple humans is essential for socialized human-robot interactions in dynamic environ-ments. Online multi-object tracking problem is equivalent to real-time belief update of a complex POMDP. The main challenge is that, without the knowledge of actual number of humans, the robot needs to estimate each human’s state information in real-time from sequentially ambiguous observations, including inevitable false and missing detections, while both the robot and humans are constantly moving. In this thesis, we propose a novel particle filtering over sets (PFS) approach to address this challenge. We define joint states and observations both as finite sets, and develop motion and ob-servation functions accordingly. The target identification problem is then solved by using the expectation-maximization (EM) method, given updated particles. The set formulation enables us to avoid directly performing observation-to-target association, leading to high fault-tolerance and robustness in complex dynamic environments with frequent noises and errors in terms of detections. The overall PFS algorithm outper-forms the state-of-the-art, in terms of CLEAR MOT metrics, in PETS2009dataset. We also demonstrate the effectiveness of PFS on a real robot, namely CoBot.

Keywords/Search Tags:

Markov Decision Process, Partially Observable Markov Decision Process, Decision-Theoretic Planning, Hierarchical Online Planning, Monte-Carlo Tree Search, Multi-Object Tracking

PDF Full Text Request

Related items

1	Research On Path Planning Based On Markov Decision Process For AUV
2	Research On Path Planning Based On Markov Decision Processes For Palletizing Robot
3	Decision-Theoretic Planning For Multi-Agent Systems
4	Hierarchical learning and planning in partially observable Markov decision processes
5	Heuristic Learning Model Based On Partially Observable Markov Decision Process
6	The Research And Design Of Point-based POMDP Value Iteration Algorithm
7	Increasing scalability in algorithms for centralized and decentralized partially observable Markov decision processes: Efficient decision-making and coordination in uncertain environments
8	Model learning and application of partially observable Markov decision processes
9	Research On Soccer Robots Cooperation Mechanism Based On Markov Decision Theory
10	Deep Value Iteration Network For Partially Observable Markov Decision Process