Policy Iteration Reinforcement Learning Based On Geodesic Gaussian Kernel

Posted on:2016-03-04

Degree:Master

Type:Thesis

Country:China

Candidate:C Yan

Full Text:PDF

GTID:2308330479485820

Subject:Control Science and Engineering

Abstract/Summary:

PDF Full Text Request

As an important class of machine learning methods,the classical reinforcement learning algorithms in the form of look-up table will encounter the curse of dimensionality problem in the large-scale or continuous spaces tasks. Approximate reinforcement learning based on approximation technique overcomes this problem. As a hot topic in the field of reinforcement learning, approximate reinforcement learning has obtained successful application in a variety of domains such as automatic control,artificial intelligence and intelligent robotics.In this thesis, approximate reinforcement learning is studied from the aspect of value-function approximation. The choice of basis functions in approximate policy iteration which is based on value function approximation is very significant. In order to solve the choice problem of basis functions for reinforcement learning based on linear value-function approximation, geodesic Gaussian kernel with simple calculation and smooth on the graph is used in this thesis to approximate smooth and discontinuous value-functions. In view of some problems in the reinforcement learning method based on geodesic Gaussian basis, two kinds of improved algorithms are proposed. First, to those reinforcement learning tasks whose state spaces are not continuous in Euclidean space, the result is not satisfactory for geodesic Gaussian basis obtaining the shortest path directly in Euclidean space; in order to solve the problem, Laplacian eigenmap algorithm based on spectral graph theory in manifold learning is introduced, and then a policy iteration reinforcement learning method based on manifold geodesic Gaussian basis defined on state graph is proposed;geodesic Gaussian basis functions based on Manifold distance can improve the estimation accuracy of value functions, and the agent could learn the optimal policy quickly. Second, in order to reflect the actual differences and similarities between different actions in the same state, basis functions are defined directly on state-action graph; in view of the single adjustable parameter of geodesic Gaussian basis limiting the generalization performance of value functions, weighted Gaussian kernel with multiple widths is introduced, and then a policy iteration reinforcement learning method based on weighted geodesic Gaussian basis with multiple widths defined on state-action graph is proposed; multi-parameter adjustment of weighted geodesic Gaussian basis with multiple widths improves the learning ability and generalizationability of basis functions, after that the accuracy of the algorithm is improved.Simulate respectively in a two-room and a four-room grid-word. The simulation results on different grid-words prove the validity and feasibility of the proposed methods.

Keywords/Search Tags:

Reinforcement learning, policy iteration, basis function, manifold space, geodesic Gaussian kernel

PDF Full Text Request

Related items

1	Research On Reninforcement Learning Network Algorithm With Self-adaptive Basis Function
2	Research On Policy Iteration Algorithm Within Bayesian Reinforcement Learning
3	Analysis And Research On Off-policy Algorithms In Reinforcement Learning
4	Research On Reinforcement Learning Based On Value Function Approximation And State Space Decomposition
5	Researches On Reinforcement Learning Algorithm Based On Nonparametric Approximation
6	Research On Basis Function Construction Methods In Reinforcement Learning
7	Research On Reinforcement Learning In Continuous Spaces
8	Research On Application Of Reinforcement Learning In Swing-up And Balance Control Of Inverted Penduum
9	Study Of Reinforcement Learning Algorithms Based On Value Function Approximation
10	Research On Multiagent Cooperation And Applications Based On Reinforcement Learning