Research On Value Function Approximation Methods In Reinforcement Learning

Posted on:2015-01-01

Degree:Master

Type:Thesis

Country:China

Candidate:G X Chen

Full Text:PDF

GTID:2268330428498540

Subject:Computer software and theory

Abstract/Summary:

PDF Full Text Request

Reinforcement Learning (RL), characterized by the Agent’s “trial and error” in theenvironment, is a kind of machine learning that doesn’t need prior knowledge. It aims tofind optimal policies that can bring maximum expected accumulated discounted rewards.In Reinforcement Learing, the environment is likely to have large or even continuousstate space. Besides, the action for the Agent to select is probably discrete or continuous.These would inevitably cause complexities. With respect to the possible problemsbrought by large or continuous space and starting from value function approximation, thispaper proposes some value function approximation models and their correspondingalgorithms in order to address some existing problems in current value functionapproximation method. The main research includes the following three parts:(1) When combining Gaussian Process Temporal Difference learning with SARSAalgorithm, the resulting algorithm is likely to have bad performance and even can’t findoptimal policies. In order to address this problem, this paper uses covariance function toestablish a new generative model about the value function and then uses linear functionand Gaussian Process to model the value function. Finally, Bayesian inference is appliedto estimate the value function and a fast-learning parametric Gaussian Process SARSAalgorithm is obtained.(2) In order to address the problem of being difficult to combine Gaussian ProcessTemporal Difference learning and Q-Learning algorithm, this paper proposes a newprobabilistic generative model about the value function for value iteration and then useslinear function and Gaussian Process to model the value function. Then, Bayesianinference is applied to compute the posterior distribution of the value function parametersand Guassian Process Q-Learning algorithm is proposed. This algorithm has the advantages of Bayesian estimation method and Q-Learning algorithm.(3) With respect to the possible “curse of dimensionality” problem caused bycontinuous action space, this paper uses Actor-Critic as the basic structure and then useslinear function to model the value function and the policy. The sigmoid function of thetemporal difference error is used to construct a mean-squared error about policyparameters. Then gradient descent and least squares methods are used to minimize thismean-sqaured error and GDCAC and LSCAC algorithms are proposed. These twoalgorithms can effectively avoid the “curse of dimensionality” problem caused bycontinuous action space and have high data efficiency.

Keywords/Search Tags:

reinforcement learning, value function approximation, Gaussian Process, SARSA algorithm, Q-Learning algorithm, continuous action space

PDF Full Text Request

Related items

1	Research On Reinforcement Learning In Continuous Spaces
2	Researches On Reinforcement Learning Algorithm Based On Nonparametric Approximation
3	Research On Multiagent Reinforcement Learning Algorithm In Continuous Action Space
4	Research On Reinforcement Learning Based On Gaussian Process Regression
5	Reaearch On Deep Reinforcement Learning Algorithm In Continuous Action On Space
6	Research On Actor-Critic Algorithm Based On The Bayesian Theory
7	Research On The Reinforcement Learning Method And Its Application
8	Cooperation Mechanism Of Simulation 2D Soccer Robot Based On Reinforcement Learning
9	Research And Design Of Soccer Robot Decision System Based On SARSA Algorithm
10	Studies On Generalized Learning Automata And Its Applications