Font Size: a A A

Policy Iteration Reinforcement Learning Based On Geodesic Gaussian Kernel

Posted on:2016-03-04Degree:MasterType:Thesis
Country:ChinaCandidate:C YanFull Text:PDF
GTID:2308330479485820Subject:Control Science and Engineering
Abstract/Summary:PDF Full Text Request
As an important class of machine learning methods,the classical reinforcement learning algorithms in the form of look-up table will encounter the curse of dimensionality problem in the large-scale or continuous spaces tasks. Approximate reinforcement learning based on approximation technique overcomes this problem. As a hot topic in the field of reinforcement learning, approximate reinforcement learning has obtained successful application in a variety of domains such as automatic control,artificial intelligence and intelligent robotics.In this thesis, approximate reinforcement learning is studied from the aspect of value-function approximation. The choice of basis functions in approximate policy iteration which is based on value function approximation is very significant. In order to solve the choice problem of basis functions for reinforcement learning based on linear value-function approximation, geodesic Gaussian kernel with simple calculation and smooth on the graph is used in this thesis to approximate smooth and discontinuous value-functions. In view of some problems in the reinforcement learning method based on geodesic Gaussian basis, two kinds of improved algorithms are proposed. First, to those reinforcement learning tasks whose state spaces are not continuous in Euclidean space, the result is not satisfactory for geodesic Gaussian basis obtaining the shortest path directly in Euclidean space; in order to solve the problem, Laplacian eigenmap algorithm based on spectral graph theory in manifold learning is introduced, and then a policy iteration reinforcement learning method based on manifold geodesic Gaussian basis defined on state graph is proposed;geodesic Gaussian basis functions based on Manifold distance can improve the estimation accuracy of value functions, and the agent could learn the optimal policy quickly. Second, in order to reflect the actual differences and similarities between different actions in the same state, basis functions are defined directly on state-action graph; in view of the single adjustable parameter of geodesic Gaussian basis limiting the generalization performance of value functions, weighted Gaussian kernel with multiple widths is introduced, and then a policy iteration reinforcement learning method based on weighted geodesic Gaussian basis with multiple widths defined on state-action graph is proposed; multi-parameter adjustment of weighted geodesic Gaussian basis with multiple widths improves the learning ability and generalizationability of basis functions, after that the accuracy of the algorithm is improved.Simulate respectively in a two-room and a four-room grid-word. The simulation results on different grid-words prove the validity and feasibility of the proposed methods.
Keywords/Search Tags:Reinforcement learning, policy iteration, basis function, manifold space, geodesic Gaussian kernel
PDF Full Text Request
Related items