Font Size: a A A

Research On Applicable Approximate POMDP Solution Algorithm

Posted on:2010-11-10Degree:MasterType:Thesis
Country:ChinaCandidate:G M XiuFull Text:PDF
GTID:2178330338989492Subject:Computer Science and Technology
Abstract/Summary:PDF Full Text Request
Sequential decision making in dynamic and uncertain contexts is one of the kernel problems in investigating the Agent's interaction strategy with the environment in AI domain. As applicaton systems grow complex, lots of practical problems can be abstracted to dynamic and uncertain sequential decision making ones, and this makes the corresponding research very senseful. POMDP, which is regarded as a powerful and flexible solution framework for dynamic and uncertain decision making problems that meet the Markov requirements, becomes the research focus.In this paper, we intend to develop applicable solution algorithms to find the optimal POMDP strategies.Instance-based algorithms, including NNI, LWI and ENNI are developed for widely applicable purpose. Algorithms in this category integrate the instance-based learning method and reinforcement learning technique. The former is used to obtain the exact interaction data of the Agent and because of its characteristic of posing few requirements on the model it contributes to the wide applicability of the final algorithms, including application in context of not only discrete Markov case, but the continuous state and non-Morkov situation. The latter is exploited to search the strategy space and find the increasingly optimal strategies. Then based on the strategy related data acquired by the learning techniques, heuristic solution method is utilized to achieve the optimal strategies. The test data show that instance-based algorithms can obtain better strategies than Q-MDP algorithm even without the model related parameters.The kernel-belief based algorithm KBVI is developed in order to handle the complexity of the POMDP solution algorithms and to realize efficient solution algorithm. KBVI samples the belief states which are reachable from initial belief states to acquire the information related to the structure of specific problems, and then based on these information approximately solve POMDP problems by using value iteration method to find the optimal interaction strategies. KBVI can reduce the complexity of solving POMDP problems to polynomial level, and in comparison with other point-base algorithm, it can achieve equivalent or even better strategies more efficiently.A platform for ruuning Agent is developed to acquire the necessary data associated with the algorithms, to solve POMDP problems by some algorithm and to run some interaction strategy and test its performance.It consists of the Agent model and the environment model. On this platform some typical POMDP problems are solved by the instance-based algorithms and KBVI, and the resulting strategies are tested. Experimental results prove the good performance of the algorithms in this paper, by comparing with other algorithms. Meanwhile, we intend to develop this platform as an applicatioin framework and foundation component of POMDP model.
Keywords/Search Tags:POMDP, Approximate Solution Algorithm, Instance-based, Reinforce- ment Learning, KBVI
PDF Full Text Request
Related items