Font Size: a A A

Research On Reinforcement Learning Based On Gaussian Process Regression

Posted on:2015-03-07Degree:MasterType:Thesis
Country:ChinaCandidate:C ZhuangFull Text:PDF
GTID:2268330428998522Subject:Software engineering
Abstract/Summary:PDF Full Text Request
Reinforcement learning is an important class of machine learning methods,particularly in the area of Artificial Intelligence. In the frameword of reinforcementlearning, the agent learns how to map the situations to actions through a long-terminteraction with the environment, so as to maximize the cumulative reward. Reinforcementlearning methods have been widely applied in the fields of games, elevator dispatching androbotics and so on.Facing the “curse of dimensionality” issue in large discrete state space or continuousspace, this paper proposes an algorithm based on value function approximation andcompares the performance of convergence of different algorithms. The main studies aregenerated as follows:(1) In general value function approximation method, the assumption of specificfunctional form is usually required. However, this paper uses the method of Gaussianprocess for regression, which does not need to claim the function relates to some specificmodels. It can represent the function obliquely, but rigorously, by the data themselves. Thismethod has the advantages of easy implementation and parameter self-adaptation. Also, ithas a good foundation of theory. The model of gaussian process for regression is also aform of supervised learning.(2) In the framework of Dyna, the convergence rate of the algorithm decreases whilethe size of the discrete state increases. In response to this problem, an optimized algorithmis proposed, which is based on Gaussian process regression and state clustering. Firstly, wereduce the scale of the discrete state space by clustering. Secondly, Gaussian processregression is used to evaluate the state-value of the high-scale discrete state space.Experiments can prove that the algorithm can effectively improve the convergence speed.(3) A value iteration algorithm based on Gaussian process regression in continuous-space named GPRV is proposed. In continuous space, value function can’t bestored as in discrete space. We combine the value iteration with the model of Gaussianprocess regession to implement the function approximation. The GPR-based algorithm canevaluate the value of state effectively, and show better convergence through the experimentof balance pole.
Keywords/Search Tags:Reinforcement Learning, function approximation, Gaussian process regression, Dyna framework, GPRV
PDF Full Text Request
Related items