Font Size: a A A

Heuristic Probabilistic Value Iteration:an Approximation Framework For POMDPs

Posted on:2015-07-28Degree:MasterType:Thesis
Country:ChinaCandidate:B DuFull Text:PDF
GTID:2308330461457936Subject:computer science and Technology
Abstract/Summary:PDF Full Text Request
Partially observable Markov decision process (POMDP) is an important technique in machine learning (ML) area. As a generalization of Markov decision process (MDP), POMDP model can simulate continuous process in real-world. POMDP has been widely used in robot navigation, mechanical maintenance and uncertainties planning area. Because of the curse of dimension and history, the exact value iteration is too complex to solve practical applications. In recent years, the approximate way has been proposed to make large-scale POMDPs solvable. The point based value iteration algorithm is the hotspot of the current research.This paper describes the MDP and POMDP model, as well as the concept of the strategy and value function based on the model. By computing the complexity of the exact value iteration, this paper give the reason why the application is blocked. After introducing the popular point-based value iteration algorithms, this paper given the commonalities and individualities.This paper proposes the Heuristic Probabilistic Value Iteration algorithm (HPVI). HPVI is an approximate framework based on the probability, while the probability is based on the heuristic criteria. This paper implements two specific standards of the framework:Density Based Heuristic Probabilistic Value Iteration (HPVI-D) and Value Function based Heuristic Probabilistic Value Iteration (HPVI-VF). HPVI makes use of the information more effectively than the algorithm based on the density only, acquires a better result than the algorithm based on the single boundary and converges faster than the algorithm based on the compound boundaries. After experimenting on 6 benchmark problems and comparing with other approximate methods, HPVI can acquire a relatively good result in a short time. The performance of HPVI-D and HPVI-VF varies in different issues. Two heuristic criteria is suitable for different practical applications.
Keywords/Search Tags:POMDP, PBVI, HPVI, HPVI-D, HPVI-VF
PDF Full Text Request
Related items