Font Size: a A A

Research On Deep Reinforcement Learning Of Agent In Multiple Games Scenarios

Posted on:2020-09-03Degree:MasterType:Thesis
Country:ChinaCandidate:X G NieFull Text:PDF
GTID:2428330623456620Subject:Computer technology
Abstract/Summary:PDF Full Text Request
As the theory of deep reinforcement learning(DRL)proposing,artificial intelligence getting futher development and AI has taken a big step toward the real intelligence.Since deep learning becoming more and more maturity,as the core research department,DeepMind has proposed a series of reinforcement learning algorithms,such as DQN(Deep Q-Network),Double-DQN,A3C(Asynchronous Advantage Actor-Critic).And,all above algorithms have beaten humans in many 2D and 3D games.However,these algorithms existing some common problems,such as high complexity,high comsumption heardware resoueces,extremely,memory,GPU and CPU.As we know,DQN series algorithms rely on high-capacity experience pools and use experience replay to effectively deal with timing dependence between training samples.However,these algorithms strongly rely on GPU,CPU and memory.Especially these algorithms consume a lot on CPU and memory while computing.A3 C does not rely on large capacity experience pool but effectively utilizes the multi-core mechanism of CPU.Base on the framework of A3 C,each agent maintains a set of super parameters exactly the same as the global neural network.While training,there are frequent transfers of super parameters gradient and copies of super parameters between the agents and the global neural network,made A3 C heavily rely on memory and CPU.Considering the common problems of DRL model,such as complexity super-parameters,CPU and memory consuming heavily and strong GPU dependence,this paper carry's out agent DRL in a variety of game scenarios.The research contents of this paper include the following two aspects:(1)A global small-batch N-step DRL method based on A3 C is proposed.Considering the main problems of A3 C such as high complexity of super-parameter,huge consuming of CPU and memory and frequent super-parameters copy and gradient transfer between agents and global neural networks,proposing a DRL model called global mini-batch N-step A3C(GMBN-A3C).The model only has one set of global hyper-parameters.Each agent asynchronously interacts with the environment to collect the samples of N-Step interaction,then store N-step sample as a whole in very small global experience.Secondly,while training of the model,random selecting time correlating N-step,unified calculating gradient of selected samples,then updating the network's super-parameters.Based on various game scenarios,the contrast experiments show that maximum interaction and the numbers of agents have a great influence on overall performance of GMBN-A3C;consume of CPU and memory of GMBN-A3 C are significantly low than A3 C,DoubleDQN and DQN,and the performance of the GMBN-A3 C reaches the best performance of A3C(2)A deep reinforcement learning method for multi-experience pool local-state parallel Q-network is proposed.Multi-experience pool local-state parallel Q-network method based is proposed.Current DRL algorithms such as A3 C and DQN use the whole game state as model input,results in huge super-parameters.DQN series algorithm learn the whole feature of game,but different area of game state has decisive significance for the prediction of the state value,some area completely determine the state value.Based on this,proposing one DQL model called multi-Experience Pool Local State Parallel Q-Network(MEPLSPQ-Network).MPLSPQ-Network contains several small capacity experience pools,which further disrupt the time series related samples,improving basic network structure of DQN to parallel structure,i.e.multi-Q network parallelism Game state is divided into several non-overlapping areas,which are used as input of branch Q network.Each branch Q network learns the fixed area of game interface separately and finally combines the feature learned by each branch Q network.Based on various game scenarios,contrast experiment results show that the number of branches of parallel Q-network has a great impact on the performance of the model.MPLSPQ-Network can effectively learn the characteristics of different parts of the game interface.The overall performance is better than DQN,and the training process is more stable,and convergence speed of the super-parameters is faster.
Keywords/Search Tags:Deep reinforcement learning, GMBN-A3C, MEPLSPQ-Network
PDF Full Text Request
Related items