Font Size: a A A

A Study Of Reinforcement Learning Based On Factor Representation

Posted on:2010-10-23Degree:MasterType:Thesis
Country:ChinaCandidate:S DaiFull Text:PDF
GTID:2178360275984414Subject:Computer application technology
Abstract/Summary:PDF Full Text Request
Reinforcement Learning is an effective method to solve the plan problem in the stochastic environment,However,in the large state space,especial for application problems with complex stochastic states,the"dimension curse"problem hasn't been solved yet.At present,the factor Reinforcement Learning which develops from Reinforcement Learning in state space and action space,has been proven to be more effective to solve the large scale state stochastic control problem,and be applicable in the AGV navigation.Now for the almost all researches . Now all the research focus on the disposition of the state space before the Reinforcement Learning, and there has been relatively little research on the process of Reinforcement Learning . Based on this idea,the following aspects are investigated and disscussed.Firstly,the basic theory background and development of factor reinfocement learning are introduced.Four typical RL algorithms are discussed and compared.The empirical results are presented to show their differences and characteristics,which offer a basis to choose the right algorithm in the following work.Secondly,a new melthod for Dynamic Programming based on factored representation is mentioned in this paper.When we use DP melthod to solve complex RL problems,It's hard to compute the acuate value of Vπ.A new melthod based on linear approximately Vπis proposed to speed up the algorithm.In traditional RL,we always use look-up table to store the value of the Value function,but it has a problem,that is it has a high redundancy,A melthod decision tree is proposed,this method will be examed in the simulink experiment in this thesis.Finally,a new algorithm for TD(λ) based on factored representation is mentioned in this paper. The main principle of the algorithm is that states are factored represented, and making use of Dynamic Bayesian Networks(DBNs) to represent the conditional probability distributions in Markov decision processes(MDPs), together with decision-trees representation of Value function in the algorithm of TD(λ) to lower the state space exploration and computation complexity. Therefore the algorithm is a promise for solving large-scale MDPs problems which are of a huge state space. The validity of this representation is demonstrated by experiments.
Keywords/Search Tags:Factored RL, Model Of Environment, DBN, Decision Tree, Algorithm Of TD
PDF Full Text Request
Related items