Research On Non-parametric Function Approximation Methods In Continuous Spaces

Posted on:2015-03-28

Degree:Master

Type:Thesis

Country:China

Candidate:W W Zhu

Full Text:PDF

GTID:2268330428998560

Subject:Computer software and theory

Abstract/Summary:

PDF Full Text Request

Reinforcement learning is a trial-and-error learning method and can be used to solvemodel-free problems, which means that, in the absence of any prior knowledge, the Agentimplements a learning process based on their own experience from constant interactionwith the environment. This paper studies the continuous state and action spaces problem.The traditional method to solve the problem is discretizing the state or action spaces, but inorder to ensure a certain precision, discretization method will inevitably generate a verylarge state or action spaces, which would result in “curse of dimensionality” problem.Based on the architecture of Actor-Critic, this paper proposes three Actor-Critic algorithmswhere critics use non-parametric function approximation to solve the “curse ofdimensionality” problem under the continuous state space, and actors use the policygradient to find the action.(1) In order to solve the problem of low sample-efficiency in existing non-parametricmethods, we propose a kernel-based recursive least squares AC algorithm. Actor presents akernel-based policy gradient algorithm which uses a kernel function to approximate thereal Q-Value when estimating the value of the policy gradient. Critic presents anALD-based KRLSTD-Q algorithm which can make the most of sample information whileeliminating matrix inversion. The availability of the algorithm can be verified by thesimulation experiments of Mountain Car.(2) In view of the effectiveness of Guassian kernel function, we propose a leastsquares support vector regression AC algorithm. Actor uses the policy gradient algorithm,moreover, in order to make this method feasible, we propose a method which can workbetween the sample sets of policy evaluation and policy improvement compatibly. And wecan obtain the data dictionary using the ALD sparsification method in the sample set of policy evaluation. The regression model of V-Value function can be calculated by using theLSSVR method on data dictionary, while the policy can be improved on the sample set ofpolicy improvement.(3) In order to solve the problem that the above two offline algorithms are notreal-time, we propose an online GPTD-AC algorithm. Actor presents an online policygradient algorithm which can adapt to the growth of the kernels, thus it is suitable for thenon-parametric algorithm of online learning. Critic uses the online GPTD algorithm totimely evaluate the action generated by the actor.

Keywords/Search Tags:

Reinforcement Learning, Non-parametric Function Approximation, Actor-Critic, Policy Gradient, Least Squares

PDF Full Text Request

Related items

1	Research On Multiagent Cooperation And Applications Based On Reinforcement Learning
2	Exdloratory Action Correction Algorithm Based On Actor-Critic
3	Research On Fast Policy Gradient Algorithms Of Reinforcement Learning Based On Adaptive Learning Rate
4	Study Of Reinforcement Learning Algorithms Based On Value Function Approximation
5	Actor-Critic Algorithms With Continuous Action Spaces
6	Reinforcement Learning Research On RoboCup Soccer Keepaway
7	Research On Deterministic Policy Gradient Algorithms With Continuous Control Task
8	Research On Regularized Policy Gradient
9	Research On The Quantitative Trading Strategy Based On Deep Policy Gradient Methods
10	Aero-engine Intelligent Control Based On Reinforcement Learning