Font Size: a A A

Research On Point-based Value Iteration Algorithms In POMDP Domains

Posted on:2016-06-11Degree:MasterType:Thesis
Country:ChinaCandidate:J H FangFull Text:PDF
GTID:2308330464953300Subject:Software engineering
Abstract/Summary:PDF Full Text Request
Partially Observable Markov Decision Process(POMDP) is an extension of a Markov Decision Process(MDP). Due to only partial knowledge about the state of the environment in the framework of POMDP, it is very difficult to solve POMDP problems. The basic idea of point-based value iteration achieved a significant breakthrough in solving POMDP problems. Therefore, A number of solvers for POMDP domains were extended based on this basic idea, mainly focusing on the improvement of the selection of the belief space subset and the order of value function updates. For the shortcomings of current algorithms, this paper mainly studies the selection methods of the belief space subset, and proposes several improved point-based approximation value iteration algorithms.i. In allusion to the problem that classic POMDP solvers are uncontrollable when collecting belief states, this paper proposes an improved heuristic search value iteration algorithm, which uses reachability as heuristic criteria to search belief state points with more significant value, and then updates value functions locally over these points so as to gain an effective approximately optimal policy.ii. Among those approximate algorithms solving POMDP problems, HSVI, which uses trial-based asynchronous value iteration, is able to handle the largest domains. Unfortunately, HSVI needs to maintain both a lower bound and an upper bound over the optimal value function and update them. Because of the very complicated computation when updating the upper bound, it greatly reduces the performance of this algorithm. For the drawbacks of HSVI, this paper provides another improved forward search value iteration algorithm, which uses the optimal policy of the underlying MDP to select belief state points and is no need to maintain upper bound over the optimal value function.iii. Because the complete POMDP solvers have poor ability to scale up, this paper proposes to decompose a factored POMDP into a set of restricted POMDPs and solve each such model independently, acquiring a value function. And then, the combination of the value functions of the restricted POMDPs was used to form a policy for the complete POMDP. This method mainly explained the process of identifying state variables that corresponded to independent tasks, and how to create a model restricted to a single task.
Keywords/Search Tags:MDP, POMDP, point-based value iteration, belief space, heuristic search
PDF Full Text Request
Related items