Research On Policy Gradient Methods Based On Functional Gradients

Posted on:2018-09-01

Degree:Master

Type:Thesis

Country:China

Candidate:P F Hou

Full Text:PDF

GTID:2348330512497701

Subject:Computer Science and Technology

Abstract/Summary:

PDF Full Text Request

Reinforcement learning is one important research fields in machine learning.It studies how to make agents improve their policies to maximize accumulated rewards by interacting with the environment.Traditional reinforcement learning is mostly based on value function.However,it's difficult to deal with continuous action tasks and there exists phenomenon of performance degradation.Consequently,policy search methods have developed significantly in recent years.Policy gradient methods play an important role in policy search,which update the policy using policy parameter gradients.Policy gradient methods usually use linear models to represent policies leading to system is limited by representation ability of linear models.Functional gradients in supervised learning could generate nonparametric models,and Boosting methods based on functional gradients have become one of the representative methods in supervised learning.Though functional gradients are seldom studied in reinforcement learning.In this thesis we make research on how to apply functional gradients in the policy gradient methods and main contributions are as follows:First of all,we design a policy gradient method PolicyBoost based on functional gradient.It could learn combinations of complex models such as decision tree,avoiding shortcomings of previous manual design of linear features.Secondly,we prove the convergence of PolicyBoost under certain conditions.By theoretical analysis we find that overfitting phenomenon could occur,and it was alleviated by introducing baselines and constructing sampling pools.Finally,several classic tasks in reinforcement learning including Mountain Car and Acrobot,and a challenging task that controls helicopter hovering are conducted to verify that the algorithm works well and stably.

Keywords/Search Tags:

Reinforcement Learning, Policy Gradient, Boosting, Convergence, Overfitting

PDF Full Text Request

Related items

1	Research On Fast Policy Gradient Algorithms Of Reinforcement Learning Based On Adaptive Learning Rate
2	Theories, Algortihms And Applications Of Policy Gradient Reinforcement Learning
3	Research On Multiagent Cooperation And Applications Based On Reinforcement Learning
4	Research On Regularized Policy Gradient
5	Deep Reinforcement Learning Based On Policy Gradient Optimization And Its Application In Agent Control
6	Optimization On Deep Reinforcement Learning Based On Policy Gradient
7	On the convergence of model -free policy iteration algorithms for reinforcement learning: Stochastic approximation under discontinuous mean dynamics
8	Research On Reinforcement Learning Methods Based On Direct Policy Search
9	Deep Deterministic Policy Gradient Based On Entropy Regularization And Regular Update
10	Research On Accelerating The Convergence Of Off-policy Temporal Difference Learning