Research On Point-based Value Iteration Algorithms In POMDP Domains

Posted on:2016-06-11

Degree:Master

Type:Thesis

Country:China

Candidate:J H Fang

Full Text:PDF

GTID:2308330464953300

Subject:Software engineering

Abstract/Summary:

PDF Full Text Request

Partially Observable Markov Decision Process(POMDP) is an extension of a Markov Decision Process(MDP). Due to only partial knowledge about the state of the environment in the framework of POMDP, it is very difficult to solve POMDP problems. The basic idea of point-based value iteration achieved a significant breakthrough in solving POMDP problems. Therefore, A number of solvers for POMDP domains were extended based on this basic idea, mainly focusing on the improvement of the selection of the belief space subset and the order of value function updates. For the shortcomings of current algorithms, this paper mainly studies the selection methods of the belief space subset, and proposes several improved point-based approximation value iteration algorithms.i. In allusion to the problem that classic POMDP solvers are uncontrollable when collecting belief states, this paper proposes an improved heuristic search value iteration algorithm, which uses reachability as heuristic criteria to search belief state points with more significant value, and then updates value functions locally over these points so as to gain an effective approximately optimal policy.ii. Among those approximate algorithms solving POMDP problems, HSVI, which uses trial-based asynchronous value iteration, is able to handle the largest domains. Unfortunately, HSVI needs to maintain both a lower bound and an upper bound over the optimal value function and update them. Because of the very complicated computation when updating the upper bound, it greatly reduces the performance of this algorithm. For the drawbacks of HSVI, this paper provides another improved forward search value iteration algorithm, which uses the optimal policy of the underlying MDP to select belief state points and is no need to maintain upper bound over the optimal value function.iii. Because the complete POMDP solvers have poor ability to scale up, this paper proposes to decompose a factored POMDP into a set of restricted POMDPs and solve each such model independently, acquiring a value function. And then, the combination of the value functions of the restricted POMDPs was used to form a policy for the complete POMDP. This method mainly explained the process of identifying state variables that corresponded to independent tasks, and how to create a model restricted to a single task.

Keywords/Search Tags:

MDP, POMDP, point-based value iteration, belief space, heuristic search

PDF Full Text Request

Related items

1	The Research And Design Of Approximation Methods For POMDP
2	The Design And Implementation Of Point-based Policy Iteration Algorithms For POMDP
3	Research On Approximate Algorithms Of POMDP And Application To TCM Therapy Planning
4	The Research And Design Of Point-based POMDP Value Iteration Algorithm
5	The Design And Implementation Of Point-based POMDP Policy Iteration Algorithm
6	Algorithms for partially observable Markov decision processes
7	Heuristic Probabilistic Value Iteration:an Approximation Framework For POMDPs
8	Planning Algorithm Based On Heuristic Graph
9	The Research Of Heuristic Search Algorithm Based On Quotient Space Of Granular Computing
10	Point-Based POMDP Solvers: Survey and Comparative Analysis