Research On Actor-Critic Algorithm Based On The Bayesian Theory

Posted on:2016-01-05

Degree:Master

Type:Thesis

Country:China

Candidate:S C Chen

Full Text:PDF

GTID:2308330464953266

Subject:Software engineering

Abstract/Summary:

PDF Full Text Request

Reinforcement learning is an important branch of machine lrarning and it has two remarkable characteristics, model-free and online learning. By interacting with the environment, the agent can get feedbacks, i.e. rewards, and then use them to adjust and improve the policy. Finally the agent can obtain the optimal policy. One of difficulties of applying reinforcement learning in large or continuous domains is how to balance exploration and exploitation. This paper firstly analyzes the problems about exploration and exploitation, and then proposes two algorithms based on a combination of the Actor-Critic algorithms and the Bayesian theory. The main research can be summarized as follows:(i) With respect to the â€œcurse of dimensionalityâ€ problem in large or continuous domains and the difficulties of balancing exploration and exploitation, this paper proposes an novel Actor-Critic algorithm based on the Gaussian Process Temporal Difference method. In the Actor part, the algorithm constructs a mean square error function with respect to the policy parameters. In the Critic part, the Gaussian process is used to model the linear state-value function. Finally, the posteriori distribution of the value function is obtained using the generated model according to Bayesian inference. Empirical results show that the algorithm can solve the balancing problem effectively with fast convergence rate.(ii) An novel off-policy Actor-Critic algorithm is proposed based on Gaussian Process. In the Actor part, the algorithm models a novel probability generation model of the action-value function, and estimate the uncertainty probability of the value function using Gaussian process. In the Critic part, the algorithm use the stochastic gradient descent method of Bellman error projection function and the eligibility trace to deal with the temporal credit problem. Experimentally, the algorithm can improve the convergence rate and accuracy effectively.

Keywords/Search Tags:

reinforcement learning, Actor-Critic algorithm, Gaussian process, Bayesian theory, continuous action space

PDF Full Text Request

Related items

1	Actor-Critic Algorithms With Continuous Action Spaces
2	Reaearch On Deep Reinforcement Learning Algorithm In Continuous Action On Space
3	Exdloratory Action Correction Algorithm Based On Actor-Critic
4	Research On Approximate Reinforcement Learning In Continuous Space
5	Research On The Exploration Performance Of Policy Based On Actor-Critic Framework
6	Reinforcement Learning Control Algorithm For Two-wheeled Balancing Vehicle
7	Researches On Improvement Of Fixed Temperature Soft Actor Critic Algorithm
8	Research On Deep Reinforcement Learning Algorithm Based On Dual-Agent Cooperation
9	Research On Value Function Approximation Methods In Reinforcement Learning
10	Speech-driven Animation Based On Actor-Critic Method