Font Size: a A A

Research On Policy Gradient Methods Based On Functional Gradients

Posted on:2018-09-01Degree:MasterType:Thesis
Country:ChinaCandidate:P F HouFull Text:PDF
GTID:2348330512497701Subject:Computer Science and Technology
Abstract/Summary:PDF Full Text Request
Reinforcement learning is one important research fields in machine learning.It studies how to make agents improve their policies to maximize accumulated rewards by interacting with the environment.Traditional reinforcement learning is mostly based on value function.However,it's difficult to deal with continuous action tasks and there exists phenomenon of performance degradation.Consequently,policy search methods have developed significantly in recent years.Policy gradient methods play an important role in policy search,which update the policy using policy parameter gradients.Policy gradient methods usually use linear models to represent policies leading to system is limited by representation ability of linear models.Functional gradients in supervised learning could generate nonparametric models,and Boosting methods based on functional gradients have become one of the representative methods in supervised learning.Though functional gradients are seldom studied in reinforcement learning.In this thesis we make research on how to apply functional gradients in the policy gradient methods and main contributions are as follows:First of all,we design a policy gradient method PolicyBoost based on functional gradient.It could learn combinations of complex models such as decision tree,avoiding shortcomings of previous manual design of linear features.Secondly,we prove the convergence of PolicyBoost under certain conditions.By theoretical analysis we find that overfitting phenomenon could occur,and it was alleviated by introducing baselines and constructing sampling pools.Finally,several classic tasks in reinforcement learning including Mountain Car and Acrobot,and a challenging task that controls helicopter hovering are conducted to verify that the algorithm works well and stably.
Keywords/Search Tags:Reinforcement Learning, Policy Gradient, Boosting, Convergence, Overfitting
PDF Full Text Request
Related items