Font Size: a A A

Research On The Fusion Of Different Memory Networks In Deep Reinforcement Learning

Posted on:2024-04-10Degree:MasterType:Thesis
Country:ChinaCandidate:Z H GuanFull Text:PDF
GTID:2568307052483454Subject:Computer application technology
Abstract/Summary:PDF Full Text Request
Deep reinforcement learning algorithm combines traditional reinforcement learning and deep learning.It is a typical algorithm for solving high-dimensional decision-making tasks.It has been widely used in various fields and has made breakthrough progress.However,traditional deep reinforcement learning does not perform well in long-time interval decision-making games and long-time interval games that require fixed-point navigation of agents in long-interval decision-making.This paper makes the following research on this issue:In this paper,a deep recurrent Q-network incorporating gated recurrent units is proposed to address the poor performance of agents in long-time interval decisionmaking.The intelligent agent can only make decisions through limited screens,and the valuable information in the previous game screens will be ignored.The input of the traditional deep Q network is composed of the latest 4 frames,which makes it difficult for the agent to make a reasonable plan for the information in the past dozen frames.This paper proposes the idea of integrating deep reinforcement learning with gated recurrent units,and controls the agent to make reasonable planning and control over a long period of time through the ability of the memory network to store past memories.In the end,the deep reinforcement learning integrated with the gate recurrent unit has significantly improved the scoring ability compared with the original deep reinforcement learning algorithm in some games,which proves the effectiveness of the deep recurrent Q network.On the basis of the previous part,this paper proposes a memory cycle Q network model that integrates memory networks to solve the problem that the agent is not strong enough in games that need to make fixed-point navigation in advance.Some games require the agent to reach a specific location at a specific time point in order to successfully avoid obstacles.Moreover,when the agent trained for a certain level is dealing with a new level,its performance is not good,and it lacks the ability of transfer learning.The memory network can write and read the memory by itself to solve the problem of migration learning of similar maps,but it has the problem of being difficult to adapt to the dynamic environment,so the internal memory network combined with the gate cycle unit plus the external memory network is very difficult.Solve the problem of adapting to the dynamic environment well,and acquire the ability of transfer learning.The memory recurrent Q-network has about 20% better scoring ability than the deep recurrent Q-network in a specific game,and it also performs well in other similar levels of the game,proving that it has considerable generalization ability.Finally,this paper summarizes the performance of the above two models in various situations,and puts forward the problems existing in the content of the current research and the outlook for future research work.
Keywords/Search Tags:Intensive learning in depth, Memory network, Gated Recurrent Units, Long-term and short-term memory network
PDF Full Text Request
Related items