Research On Policy Learning Via Imitation

Posted on:2016-02-22

Degree:Master

Type:Thesis

Country:China

Candidate:Y Qian

Full Text:PDF

GTID:2348330461958739

Subject:Computer application technology

Abstract/Summary:

PDF Full Text Request

Reinforcement learning is one of major topics in machine learning,which aims at improving learning system’s performance automatically through interacting with the environment.The main bottleneck of reinforcement learning is the delay of the reward from environment,making it hard to infer the optimal decision of each step.Therefore in many cases,the system cannot learn the optimal policy efficiently.Through introducing the guidance of a teacher,imitation learning can provide good reference for the decision of each step,alleviating the problem of reward delay and thus attracts more and more researchers’ attention during the past few years.This thesis attempts to learn policies under different situations of demonstrations and makes several contributions summarized as follows:Firstly,for the "Angry Bird" game where the reward delay is weak,we propose a new policy learning method based on the single-step demonstration data.The proposed method achieves better performance than the champion of 2012 "AngryBird AI Competition".Secondly,for the "Ms.PacMan" game where the reward delay is severe,we propose a new policy learning method based on the multi-step demonstration data.The proposed method’s performance is close to DQN method published by Google DeepMind research group in Nature 2015.Finally,to address the high cost of collecting demonstration data from experts,we propose the RFPotential framework to automatically learn a better shaping reward from self-generated demonstration data.Experiments show that the proposed method can effectively accelerate the reinforcement learning process.

Keywords/Search Tags:

Machine Learning, Reinforcement Learning, Imitation Learning, Policy Gradient

PDF Full Text Request

Related items

1	Research On Reinforcement Learning Methods Based On Direct Policy Search
2	Research On Fast Policy Gradient Algorithms Of Reinforcement Learning Based On Adaptive Learning Rate
3	Theories, Algortihms And Applications Of Policy Gradient Reinforcement Learning
4	Research On Decision Distribution Modeling In Reinforcement Learning
5	Research On Multiagent Cooperation And Applications Based On Reinforcement Learning
6	Supervised Reinforcement Learning:methods And Applications
7	Deep Deterministic Policy Gradient Based On Entropy Regularization And Regular Update
8	Inverse Reinforcement Learning And Imitation Learning With Applications In Intelligent Robotics
9	Research On Reinforcement Learning Method For Game Manipulation Behavior Imitation
10	Robust Policy Gadient Algorithm Based On Actor-Critic In Deep Reinforcement Learning