Font Size: a A A

Research On Actor-Critic Algorithm Based On The Bayesian Theory

Posted on:2016-01-05Degree:MasterType:Thesis
Country:ChinaCandidate:S C ChenFull Text:PDF
GTID:2308330464953266Subject:Software engineering
Abstract/Summary:PDF Full Text Request
Reinforcement learning is an important branch of machine lrarning and it has two remarkable characteristics, model-free and online learning. By interacting with the environment, the agent can get feedbacks, i.e. rewards, and then use them to adjust and improve the policy. Finally the agent can obtain the optimal policy. One of difficulties of applying reinforcement learning in large or continuous domains is how to balance exploration and exploitation. This paper firstly analyzes the problems about exploration and exploitation, and then proposes two algorithms based on a combination of the Actor-Critic algorithms and the Bayesian theory. The main research can be summarized as follows:(i) With respect to the “curse of dimensionality” problem in large or continuous domains and the difficulties of balancing exploration and exploitation, this paper proposes an novel Actor-Critic algorithm based on the Gaussian Process Temporal Difference method. In the Actor part, the algorithm constructs a mean square error function with respect to the policy parameters. In the Critic part, the Gaussian process is used to model the linear state-value function. Finally, the posteriori distribution of the value function is obtained using the generated model according to Bayesian inference. Empirical results show that the algorithm can solve the balancing problem effectively with fast convergence rate.(ii) An novel off-policy Actor-Critic algorithm is proposed based on Gaussian Process. In the Actor part, the algorithm models a novel probability generation model of the action-value function, and estimate the uncertainty probability of the value function using Gaussian process. In the Critic part, the algorithm use the stochastic gradient descent method of Bellman error projection function and the eligibility trace to deal with the temporal credit problem. Experimentally, the algorithm can improve the convergence rate and accuracy effectively.
Keywords/Search Tags:reinforcement learning, Actor-Critic algorithm, Gaussian process, Bayesian theory, continuous action space
PDF Full Text Request
Related items