Performance guarantee of a sub-optimal policy for a discrete Markov decision process and its application to a robotic surveillance problem

Posted on:2015-03-03

Degree:Ph.D

Type:Dissertation

University:Texas A&M University

Candidate:Park, Myoungkuk

Full Text:PDF

GTID:1478390017496353

Subject:Mechanical engineering

Abstract/Summary:

This dissertation deals with the development and analysis of sub-optimal decision algorithms for a collection of robots that assist a remotely located operator in perimeter surveillance. The operator is tasked with the classification of incursions across the perimeter. Whenever there is an incursion into the perimeter, a nearby Unattended Ground Sensor (UGS) signals an alert. A robot services the alert by visiting the alert location, collecting evidence in the form of video imagery, and transmitting it to the operator.;The accuracy of operator's classification depends on the volume and freshness of information gathered and provided by the robots at locations where incursions occur. There are two competing needs for a robot: it needs to spend adequate time at an alert location to collect evidence for aiding the operator in accurate classification but it also needs to service other alerts as soon as possible, so that the evidence collected is relevant. The control problem is to determine the optimal amount of time a robot must spend servicing an alert. The incursions are stochastic and their statistics are assumed to be known.;The control problem may be posed as a Markov Decision Problem (MDP). Dynamic Programming(DP) provides the optimal policy to the MDP. However, because of the "curse of dimensionality" of DP, finding the optimal policy is not practical in many applications. For a perimeter surveillance problem with two robots and five UGS locations, the number of states is of the order of billions. Approximate Dynamic Programming (ADP) via Linear Programming (LP) provides a way to approximate the value function and derive sub-optimal strategies. Using state partitioning and ADP, this dissertation provides different LP formulations for upper and lower bounds to the value function of the MDP, and shows the relationship between LPs and MDP. The novel features of this dissertation are (1) the derivation of a tractable lower bound via LP and state partitioning, (2) the construction of a sub-optimal policy whose performance exceeds the lower bound, and (3) the derivation of an upper bound using a non-linear programming formuation. The upper and lower bounds provides approximation ratio to the value function. Finally, illustrative perimeter surveillance examples corroborate the results derived in this dissertation.

Keywords/Search Tags:

Surveillance, Sub-optimal, Decision, Robot, Dissertation, Value function, Policy, Problem

Related items

1	Research Of Policy Algorithms Applied To Perceptual Decision-Making Tasks
2	Risk -sensitive control of discrete -time partially observed Markov decision processes
3	Based On The Neural Network Method To Solve Inverse Optimal Value Problem
4	Research On Switching Decision-making And Planning Of Multi-habitat Robots In Complex Environments
5	The Design And Implementation Of Point-based POMDP Policy Iteration Algorithm
6	Complexity analysis and optimal algorithms for decentralized decision making
7	Research On Learning Of The Optimal Policy In Largescale State Space
8	Analysis And Research On Off-policy Algorithms In Reinforcement Learning
9	Optimal networked controllers for networked plants
10	For Chinese Type Of Decision-making Problem Of Intelligent Recognition Method Research