Font Size: a A A

The Design And Implementation Of Point-based POMDP Policy Iteration Algorithm

Posted on:2015-01-07Degree:MasterType:Thesis
Country:ChinaCandidate:B HanFull Text:PDF
GTID:2308330461456663Subject:Software engineering
Abstract/Summary:PDF Full Text Request
How to make a series of decisions in the state information which is not completely determined has always been an important topic in the field of artificial intelligence. Partially observable Markov decision processes (POMDP) can help the agent to obtain status information which is not completely observed from the environment and to develop high reward decision, thus it has high practical value. But because solving the POMDP problems in precise ways is non-polynomial hard problem. So the solution with precise computing is extremely limited in practical application. Now people mainly use the approximation methods. These method have better performance than the exact ones and have a wider range of applications.This paper firstly introduces concepts and mathematical models of the Markov decision process (MDP) and the extended part of it:partially observed Markov decision process (POMDP). Based on the description and discussion of the methods with precise calculation, this paper introdeces some practical approximation algorithms and compares some representative ones:PBVI, Perseus and PBPI methods, etc. For these methods, this paper mainly compares and analyses the differences of point set selection and value function iteration in each algorithm.According to the algorithms and the research results that already exist, this paper puts foward PCFBPI(Point Clustering Feature Based Policy Iteration) algorithm which is based on clustering method and point based policy iteration. This paper focuses on the study of the distribution of reachable points in the belief space. And It discusses the strategies of the point set’s selection and expansion by making the use of characteristics of the reachable points’distribution in the belief space. According to the algorithm proposed in this paper, the implementation code is introduced. After the introduction there is an experiment comparing PCFBPI (Point Clustering Feature Based Policy Iteration) with PBPI on several representative POMDP models.Experimental results show the proposed algorithm in this paper compared with PBPI is improved in the use of clustering reachable points. But in the problem whose model is small, PCFBPI’ advantage is not obvious.
Keywords/Search Tags:Partially Observable Markov Decision Process, Clustering, Point-based Policy Iteration, Sequential Decision Problem
PDF Full Text Request
Related items