Font Size: a A A

The Research And Design Of Approximation Methods For POMDP

Posted on:2018-05-25Degree:MasterType:Thesis
Country:ChinaCandidate:B B LiuFull Text:PDF
GTID:2348330515497279Subject:Control Science and Engineering
Abstract/Summary:PDF Full Text Request
POMDP provides a mathematical framework for decision problems in uncertain conditions.It has great potential in spoken dialog systems,robot control,medical diag-nosis and other domains.However,due to the curses of history and dimensionality,the exact solution algorithm of POMDP is NP-Hard,which greatly limits its application in practice.In recent years,approximation methods especially point-based methods have made great progress in computing good policies.Point-based algorithm only considers the reachable space of the initial belief point,and iterates on the sampling points of the reachable space.The difference between the different algorithms lies mainly in the sampling method and the iterative strategy.The representative algorithms are point-based value iteration(PBVI),forward search value iteration(FSVI),and heuristic search value iteration(HSVI).They can usually obtain optimal or near optimal policies.Another important approximation algorithm is based on the approximation of iterative functions,such as MDP-based approximation(QMDP),fast informed bound method(FIB).The results of them are the upper or lower bounds of exact value functions.These algorithms are usually simple and fast,able to deal with large-scale problems,but there is no guarantee of the quality of the resulting strategies.In order to get a good lower bound in a short time,this thesis presents a method—Relevant State Update(RSU).Its main idea is to approximate the improvement of a belief by the improvement of its possible states.It makes use of the underlying MDP to elect states that are contained in optimally reachable belief space.Value iterations are performed over these states simply.Further,topological structure of states is used to reorder the iterations of states to accelerate the convergence speed.Based on the upper and lower bounds,a point-based algorithm MHSVI is proposed,which generates the belief point path based on the possible optimal value function,eval-uates the possible values of the path,and prunes the path according to the evaluation value,so that the value function can converge quickly.The proposed algorithms and the existing algorithms are experimented on several representative problems and the experimental results show the effectiveness of RSU and MHSVI.
Keywords/Search Tags:POMDP, point-based value iteration, heuristic search value iteration, reachable space
PDF Full Text Request
Related items