Font Size: a A A

Research On Continual Reinforcement Learning Based On Hindsight And Progressive Expansion

Posted on:2022-03-26Degree:MasterType:Thesis
Country:ChinaCandidate:M X DuFull Text:PDF
GTID:2518306569997549Subject:Computer Science and Technology
Abstract/Summary:PDF Full Text Request
With the introduction of Deep Learning,Reinforcement Learning has achieved breakthrough development.In recent years,Deep Reinforcement Learning,which combines Reinforcement Learning and Deep Learning,has gradually become one of the mainstream of Artificial Intelligence.Deep Reinforcement Learning is a Artificial General Intelligence paradigm,through which most problems can be formalized.But Deep Reinforcement Learning is in the exploratory research stage,so it cannot solve problems stably and effectively.On the one hand,Deep Mind's Alpha Go agent defeated humans in the Complete Information game Go.On the other hand,due to the partial knowledge and the uncertainty,the Incomplete Information Game has become the focus of research in the field of Machine Games.This dissertation aims to study the learning methods of Continural Reinforcement Learning agents under the environment of Incomplete Information Game.Respectively propose solutions to the two problems of sparse rewards and multi-subtasks in this environment.In order to solve the problem of sparse rewards,this dissertation proposes a Reinforcement Learning method for Direct Future Prediction based on supervised signal training.By using the abundant agent state changes in the Incomplete Information Game scene as the supervision signal instead of the reward signal in the traditional Reinforcement Learning,supervised learning regression training is performed on each prediction network,and the decision-making action is combined with the Goal-oriented Reinforcement Learning method.At the same time,the post-review method is used to shape an Off-policy review experience pool to solve the problem of uneven supervision signals in the scenario of Incomplete Information Game and improve the efficiency of the future value prediction algorithm.Aiming at the problem of multiple subtasks in an Incomplete Information Game environment,this dissertation mainly uses the inheritance relationship curriculum to learn each subtask step by step.For the catastrophic forgetting problem caused by the transfer of knowledge in the course of learning,the Progressive Neural Network of the Continural Learning framework is introduced to dynamically expand the Direct Future Prediction network structure,and the old knowledge is carried out on the basis of ensuring that the previously learned knowledge is not forgotten.Use and use the new network to learn new knowledge.Due to the independence of the discrete prediction networks of the Direct Future Prediction network,when facing tasks of different dimensions,new prediction networks are freely discarded or expanded to solve the problem of the inconsistency of the action dimensions of each task in a complex environment.This dissertation uses the Pommerman game in an Incomplete Information Game environment as the experimental test platform.First,it verifies the effectiveness of the baseline method by comparing it with various classic traditional Reinforcement Learning methods.The effectiveness of the methods proposed in Chapter 3 and Chapter 4 are verified through comparative ablation experiments.And through the game test with the NIPS participating agent,the performance improvement effect of the agent proposed in this dissertation is verified.
Keywords/Search Tags:Incomplete Information Game, Reinforcement Learning, Continual Learning, Curriculum Learning
PDF Full Text Request
Related items