Exploiting insensitivity in stochastic systems to learn approximately optimal policie

Posted on:2013-09-12

Degree:Ph.D

Type:Thesis

University:University of Illinois at Urbana-Champaign

Candidate:Davidson, James C

Full Text:PDF

GTID:2450390008990432

Subject:Computer Engineering

Abstract/Summary:

How does uncertainty affect a robot when attempting to generate a control policy to achieve some objective? How sensitive is the obtained control policy to perturbations? These are the central questions addressed in this dissertation. For most real-world robotic systems, the state of the system is observed only indirectly through limited sensor modalities. Since the actual state of the robot is not fully observable, partially observable information is all that is available to infer the state of the system. Further complicating matters, the system may be subject to disturbances that not only perturb the evolution of the system but also perturb the sensor data. Determining policies to effectively and efficiently govern the behavior of the system relative to a stated objective becomes computationally burdensome and, for many systems, impractical for the exact case. Thus, much research has been devoted to determining approximately optimal solutions for these partially observed Markov decision processes (POMDPs).;The techniques presented herein exploit the inherent insensitivity in POMDPs based on the notion that small changes in a policy have little impact on the quality of the solution except at a small set of critical points. First, a hierarchical method for determining nearly optimal policies is presented that achieves temporal and spatial abstraction though local approximations. Through a mixed simulation and analytic representation, a directed graph is generated to determine the underlying POMDP structure. The result is a multi-query method for generating the structural representation offline. The graph is generated by randomly sampling vertices. Local policies are then used to connect to the newly added vertices. A new edge is added if the local policy was successful. By continuing to extend the graph at each iteration of the algorithm, a sparse representation is obtained. Theoretical and simulation-based results are provided to demonstrate the effectiveness of this approach. The second technique extends the methodology of the first technique to an anytime algorithm. Adaptive sampling is used to quickly and effective determine nearly optimal policies. Between exploitation and exploration sampling, the structural representation is expanded based on inductive bias on the past performance of the sampling algorithm in the neighborhood of a perspective sample. In this way, we are able to preferentially sample policies that are both more likely to result in better exploration and also more likely to increase the connectivity in a region of the space that has a lower cost. Finally, a perturbation analysis framework is developed. This serves two purposes. First, the derived analysis is used to support the hypothesis that POMDPs are often insensitive and to identify when they are not. Secondly, the perturbation analysis framework enables the chaining of forecasted evolutions together into a compact representation. This compact representation provides even greater temporal and spatial abstraction in an analytic representation.

Keywords/Search Tags:

System, Representation, Optimal, Policy

Related items

1	The Optimal Harvest Policy Of Two Species Model With Several Types Of Stage Structures
2	M/G/1(Repairable)Queueing System Under Min(N,D,V)-Policy Control
3	M/G/1(Repairable)Queueing System With Adaptive Multistage Vacation And Min(N,V)-Policy
4	An Optimal Replacement Policy For The Repair System Under ?-power Process
5	The Optimal Control Policy For Several Types Of Prey-predator System
6	Research On Optimal Policy Of Inventory System Based On Sensitivity Optimization Method
7	M/G/1 Repairable Queueing System With Warm Standby Failure And Control Policy
8	Promote T Policy M/g/1 System With Multiple Vacations Geom ~ X/g/1 Repairable System Analysis
9	M/G/1 Repairable Queueing System With Warm Standby Failure And Min (N,V)-Policy Based On Single Server Vacation
10	Analysis Of M/G/1 Queueing System With Server Vacation And Min(N,D,V)-Control Policy