Font Size: a A A

Research On Policy Learning Via Imitation

Posted on:2016-02-22Degree:MasterType:Thesis
Country:ChinaCandidate:Y QianFull Text:PDF
GTID:2348330461958739Subject:Computer application technology
Abstract/Summary:PDF Full Text Request
Reinforcement learning is one of major topics in machine learning,which aims at improving learning system’s performance automatically through interacting with the environment.The main bottleneck of reinforcement learning is the delay of the reward from environment,making it hard to infer the optimal decision of each step.Therefore in many cases,the system cannot learn the optimal policy efficiently.Through introducing the guidance of a teacher,imitation learning can provide good reference for the decision of each step,alleviating the problem of reward delay and thus attracts more and more researchers’ attention during the past few years.This thesis attempts to learn policies under different situations of demonstrations and makes several contributions summarized as follows:Firstly,for the "Angry Bird" game where the reward delay is weak,we propose a new policy learning method based on the single-step demonstration data.The proposed method achieves better performance than the champion of 2012 "AngryBird AI Competition".Secondly,for the "Ms.PacMan" game where the reward delay is severe,we propose a new policy learning method based on the multi-step demonstration data.The proposed method’s performance is close to DQN method published by Google DeepMind research group in Nature 2015.Finally,to address the high cost of collecting demonstration data from experts,we propose the RFPotential framework to automatically learn a better shaping reward from self-generated demonstration data.Experiments show that the proposed method can effectively accelerate the reinforcement learning process.
Keywords/Search Tags:Machine Learning, Reinforcement Learning, Imitation Learning, Policy Gradient
PDF Full Text Request
Related items