Font Size: a A A

Research On Value Function Approximation Methods In Reinforcement Learning

Posted on:2015-01-01Degree:MasterType:Thesis
Country:ChinaCandidate:G X ChenFull Text:PDF
GTID:2268330428498540Subject:Computer software and theory
Abstract/Summary:PDF Full Text Request
Reinforcement Learning (RL), characterized by the Agent’s “trial and error” in theenvironment, is a kind of machine learning that doesn’t need prior knowledge. It aims tofind optimal policies that can bring maximum expected accumulated discounted rewards.In Reinforcement Learing, the environment is likely to have large or even continuousstate space. Besides, the action for the Agent to select is probably discrete or continuous.These would inevitably cause complexities. With respect to the possible problemsbrought by large or continuous space and starting from value function approximation, thispaper proposes some value function approximation models and their correspondingalgorithms in order to address some existing problems in current value functionapproximation method. The main research includes the following three parts:(1) When combining Gaussian Process Temporal Difference learning with SARSAalgorithm, the resulting algorithm is likely to have bad performance and even can’t findoptimal policies. In order to address this problem, this paper uses covariance function toestablish a new generative model about the value function and then uses linear functionand Gaussian Process to model the value function. Finally, Bayesian inference is appliedto estimate the value function and a fast-learning parametric Gaussian Process SARSAalgorithm is obtained.(2) In order to address the problem of being difficult to combine Gaussian ProcessTemporal Difference learning and Q-Learning algorithm, this paper proposes a newprobabilistic generative model about the value function for value iteration and then useslinear function and Gaussian Process to model the value function. Then, Bayesianinference is applied to compute the posterior distribution of the value function parametersand Guassian Process Q-Learning algorithm is proposed. This algorithm has the advantages of Bayesian estimation method and Q-Learning algorithm.(3) With respect to the possible “curse of dimensionality” problem caused bycontinuous action space, this paper uses Actor-Critic as the basic structure and then useslinear function to model the value function and the policy. The sigmoid function of thetemporal difference error is used to construct a mean-squared error about policyparameters. Then gradient descent and least squares methods are used to minimize thismean-sqaured error and GDCAC and LSCAC algorithms are proposed. These twoalgorithms can effectively avoid the “curse of dimensionality” problem caused bycontinuous action space and have high data efficiency.
Keywords/Search Tags:reinforcement learning, value function approximation, Gaussian Process, SARSA algorithm, Q-Learning algorithm, continuous action space
PDF Full Text Request
Related items