Font Size: a A A

The Design And Implementation Of Point-based Policy Iteration Algorithms For POMDP

Posted on:2014-01-20Degree:MasterType:Thesis
Country:ChinaCandidate:W LiFull Text:PDF
GTID:2248330395495723Subject:Computer Science and Technology
Abstract/Summary:PDF Full Text Request
Planning under uncertainty is increasingly important in research and application field, and so is the robust and scalable algorithm under uncertainty for development of effective autonomous and semi-autonomous systems. Partially Observable Markov Decision Processes (POMDPs) is a framework for optimal decision making in uncertainty environment. However, for the disaster of history and dimension, exact method for POMDP is not practical for real word application. Though policy iteration method is efficient than value iteration, its application in real word also restricted in some domain of small domain.Approximate solutions, especially point-based algorithms make POMDP practical. By collecting important belief points of the space and then planning over those points, point-based algorithms gain approximation of the optimal policy.This thesis introduce the MDP and its generalization to POMDP, and as well the value function and iteration process. Through a brief introduction of the exact solutions, including the policy iteration method and the point-based solutions (PBVI, HSVI, FSVI), we give the analysis of time complexity of them, and conclude the reason for exact methods’ impractical for real world problems. We also details the differences in point collection and backup of them.This thesis comes up with two new algorithms, PIPBVI and PBHSVI and conduct experiments in four benchmark data sets. Through the comparison with other classic point-based algorithms, we analyze and discuss their performance in different problems. PIPBVI gains improvement from PBVI by the pruning of belief sets and vector sets, and the interpolation of points where will improvement the value function. PBHSVI combines the advantages of policy iteration and heuristic searching policy of point-based methods, by using interleaving step of straightforward policy iteration over the state controllers and backup operation over corresponding points, PBHSVI speeds up the convergence without loss of the optimality of policy.
Keywords/Search Tags:MDP, POMDP, Policy iteration, Point-based algorithm
PDF Full Text Request
Related items