The Design And Implementation Of Point-based POMDP Policy Iteration Algorithm

Posted on:2015-01-07

Degree:Master

Type:Thesis

Country:China

Candidate:B Han

Full Text:PDF

GTID:2308330461456663

Subject:Software engineering

Abstract/Summary:

PDF Full Text Request

How to make a series of decisions in the state information which is not completely determined has always been an important topic in the field of artificial intelligence. Partially observable Markov decision processes (POMDP) can help the agent to obtain status information which is not completely observed from the environment and to develop high reward decision, thus it has high practical value. But because solving the POMDP problems in precise ways is non-polynomial hard problem. So the solution with precise computing is extremely limited in practical application. Now people mainly use the approximation methods. These method have better performance than the exact ones and have a wider range of applications.This paper firstly introduces concepts and mathematical models of the Markov decision process (MDP) and the extended part of it:partially observed Markov decision process (POMDP). Based on the description and discussion of the methods with precise calculation, this paper introdeces some practical approximation algorithms and compares some representative ones:PBVI, Perseus and PBPI methods, etc. For these methods, this paper mainly compares and analyses the differences of point set selection and value function iteration in each algorithm.According to the algorithms and the research results that already exist, this paper puts foward PCFBPI(Point Clustering Feature Based Policy Iteration) algorithm which is based on clustering method and point based policy iteration. This paper focuses on the study of the distribution of reachable points in the belief space. And It discusses the strategies of the point set’s selection and expansion by making the use of characteristics of the reachable points’distribution in the belief space. According to the algorithm proposed in this paper, the implementation code is introduced. After the introduction there is an experiment comparing PCFBPI (Point Clustering Feature Based Policy Iteration) with PBPI on several representative POMDP models.Experimental results show the proposed algorithm in this paper compared with PBPI is improved in the use of clustering reachable points. But in the problem whose model is small, PCFBPI’ advantage is not obvious.

Keywords/Search Tags:

Partially Observable Markov Decision Process, Clustering, Point-based Policy Iteration, Sequential Decision Problem

PDF Full Text Request

Related items

1	The Research And Design Of Point-based POMDP Value Iteration Algorithm
2	Agent Sequential Decision-making Approach And Its Application Under Uncertain Enviroment
3	Deep Value Iteration Network For Partially Observable Markov Decision Process
4	Heuristic Learning Model Based On Partially Observable Markov Decision Process
5	Learning partially observable Markov decision processes using abstract actions
6	Increasing scalability in algorithms for centralized and decentralized partially observable Markov decision processes: Efficient decision-making and coordination in uncertain environments
7	Markov Theory Based Planning And Sensing Under Uncertainty
8	Research On Path Planning Based On Markov Decision Process For AUV
9	Research On Path Planning Based On Markov Decision Processes For Palletizing Robot
10	Research On Optimization Of Service Composition Based On Partially Observable Environment