The Research And Design Of Approximation Methods For POMDP

Posted on:2018-05-25

Degree:Master

Type:Thesis

Country:China

Candidate:B B Liu

Full Text:PDF

GTID:2348330515497279

Subject:Control Science and Engineering

Abstract/Summary:

PDF Full Text Request

POMDP provides a mathematical framework for decision problems in uncertain conditions.It has great potential in spoken dialog systems,robot control,medical diag-nosis and other domains.However,due to the curses of history and dimensionality,the exact solution algorithm of POMDP is NP-Hard,which greatly limits its application in practice.In recent years,approximation methods especially point-based methods have made great progress in computing good policies.Point-based algorithm only considers the reachable space of the initial belief point,and iterates on the sampling points of the reachable space.The difference between the different algorithms lies mainly in the sampling method and the iterative strategy.The representative algorithms are point-based value iteration(PBVI),forward search value iteration(FSVI),and heuristic search value iteration(HSVI).They can usually obtain optimal or near optimal policies.Another important approximation algorithm is based on the approximation of iterative functions,such as MDP-based approximation(QMDP),fast informed bound method(FIB).The results of them are the upper or lower bounds of exact value functions.These algorithms are usually simple and fast,able to deal with large-scale problems,but there is no guarantee of the quality of the resulting strategies.In order to get a good lower bound in a short time,this thesis presents a method�Relevant State Update(RSU).Its main idea is to approximate the improvement of a belief by the improvement of its possible states.It makes use of the underlying MDP to elect states that are contained in optimally reachable belief space.Value iterations are performed over these states simply.Further,topological structure of states is used to reorder the iterations of states to accelerate the convergence speed.Based on the upper and lower bounds,a point-based algorithm MHSVI is proposed,which generates the belief point path based on the possible optimal value function,eval-uates the possible values of the path,and prunes the path according to the evaluation value,so that the value function can converge quickly.The proposed algorithms and the existing algorithms are experimented on several representative problems and the experimental results show the effectiveness of RSU and MHSVI.

Keywords/Search Tags:

POMDP, point-based value iteration, heuristic search value iteration, reachable space

PDF Full Text Request

Related items

1	Research On Point-based Value Iteration Algorithms In POMDP Domains
2	The Design And Implementation Of Point-based Policy Iteration Algorithms For POMDP
3	The Research And Design Of Point-based POMDP Value Iteration Algorithm
4	The Design And Implementation Of Point-based POMDP Policy Iteration Algorithm
5	Research On Approximate Algorithms Of POMDP And Application To TCM Therapy Planning
6	Heuristic Probabilistic Value Iteration:an Approximation Framework For POMDPs
7	L₁-norm Minimization Method Based On Bregman Iteration And Its Applications
8	Research On Robot Navigation Based On POMDP
9	Research And Implementation Of Multi-plex Iteration Based On MapReduce
10	Understanding the Impact of Support for Iteration on Code Searc