Font Size: a A A

Research On Planning In Partially Observable Domains Without Prior Knowledge

Posted on:2020-09-16Degree:MasterType:Thesis
Country:ChinaCandidate:J Y ZhengFull Text:PDF
GTID:2428330572982235Subject:Control Engineering
Abstract/Summary:PDF Full Text Request
Planning in stochastic and partially observable environments is a central issue in artificial intelligence.The standard approach to address this problem is to model the dynamics of the environments firstly and then the problem can be solved using the obtained model.The Partially Observable Markov Decision Process(POMDP)model is the mainstream modeling method.However,the POMDP model is difficult to obtain directly through learning,which also leads to the fact that most of the research works related to it need to assume that the accurate POMDP model is known in advance.Although the Bayes-Adaptive POMDP(BA-POMDP)model has been proposed for learning optimal policy under model uncertainty,prior knowledge about the environment is still needed to guarantee the performance.However,it is unrealistic to have an accurate dynamical system model or a large amount of prior knowledge in advance.Predictive State Representation(PSR)model offers another framework for modeling partially observable dynamical system.Since the PSR model is modeled based entirely on observable quantities,PSR model can be learned from data without prior knowledge.However,there are few studies on how to use PSR model for planning.With the benefits of the PSR model,in this paper,we research the problem of planning in partially observable domains without prior knowledge.The main contributions of this paper are summarized as follows:(1)Monte Carlo Tree Search(MCTS)algorithm is an effective planning algorithm.In this paper,firstly,we combine the offline learning PSR model and MCTS algorithm to solve the planning problem in partial observable domains without prior knowledge.Because the original PSR model does not contain the reward information of dynamical system,it cannot be well applied to planning problem.We improve the original PSR model and solve this problem by treating some reward signals of the underlying system as the observations of the environment.The MCTS algorithm is adjusted according to the structure characteristics of the PSR model.Then the PSR model is combined with the MCTS algorithm,and the PSR-MCTS algorithm is proposed.The basic idea of this algorithm is to first learn an offline PSR model by training data,and then the learned PSR model is combined with the online MCTS algorithm for online planning.(2)However,offline learning the model often needs to store the entire training data and cannot utilize the data generated in the planning phase,which limits the application of the related approaches.In the case that the training data is difficult to obtain,it is necessary to make effective decisions based on existing knowledge and use the data obtained in the planning phase to improve the algorithm.Based on the online spectral algorithms that can learn and update the PSR model online,we further improve the PSR-MCTS algorithm and propose the PSR-MCTS-Online algorithm.In the PSR-MCTS-Online algorithm,learning and planning phases can both be executed online for stochastic and partially observable environments and not prior knowledge is required.The PSR-MCTS-Online algorithm can continue to use the data obtained in planning phase to improve the performance of the algorithm,and realize the online learning and planning from scratch.
Keywords/Search Tags:Planning, PSR model, MCTS algorithm
PDF Full Text Request
Related items