Font Size: a A A

Research Of Deep Reinforcement Learning In Real-Time Strategy Games

Posted on:2020-11-03Degree:MasterType:Thesis
Country:ChinaCandidate:X X ShenFull Text:PDF
GTID:2428330575998464Subject:Computer Science and Technology
Abstract/Summary:PDF Full Text Request
In recent years,deep learning has not only promoted the development of image detection,speech recognition and natural language processing,but also made new breakthroughs in research results in the field of reinforcement learning.After the development of deep reinforcement learning,agents have achieved results beyond the human level in video games,"Go chess expert" Alphago and Alphazero also showed super talent and defeated human Go experts.However,deep reinforcement learning begins to encounter bottlenecks in the application of more complex real-time strategy games.The main problems it faces include the following two aspects:First,in the self-behavior decision-making of agents,the action value function is unstable in the deep reinforcement learning algorithm based on value function iteration.Second,the problem of cooperation and competition among multiple agents.According to the above two aspects,the main work of this thesis is divided into the following t'wo parts.(1)The algorithm of exponential moving averaging triple Q-network is proposed.this thesis analyzes the action value function in deep reinforcement learning,analyzes the instability characteristics of action value function in deep Q-network algorithm,according to the averaged deep Q-network algorithm,the algorithm of exponential moving averaged Q-network algorithm is further proposed.Furthermore,this thesis analyzes the role deterministic policy gradient algorithm and deep Q-network algorithm in the deep deterministic policy gradient algorithm.At the same time,the algorithm of exponential moving averaging triple Q-network is proposed in the deep deterministic policy gradient algorithm.The core of the algorithm is to introduce the exponential moving averaged Q-network and improve the gradient formula of policy update in the deep deterministic policy gradient algorithm.The robustness of the agent behavior policy is increased.(2)Propose a Q-network based on self-attention mechanism.When multi-agents cooperate and compete,learning to pay attention to the characteristics of more valuable agents or the state of the game environment will help improve the behavior policies of multi-agents.Through the understanding and discussion of the application of attention mechanism in natural language processing,this thesis proposes a Q-network based on self-attention mechanism to optimize the behavior policies of multiple agents,so that multiple agents can perform better.Applying the exponential moving averaging triple Q-network algorithm proposed in this thesis,the agent obtains higher benefits than the original algorithm in multiple deep reinforcement learning environments with continuous action space.In the simulation environment that includes both cooperation and competition,the multi-agent deep deterministic policy gradient algorithm uses the Q-network model based on the self-attention mechanism obtains higher returns.The two improved algorithm applications are not limited to the simulation environment or the game but also the real life scene,and improve the behavior policies of agents and get more rewards.
Keywords/Search Tags:Deep reinforcement learning, Real-time strategy games, Deep deterministic policy gradient algorithm, Exponential moving averaging, Agent, Action value function, Q-network, Self-attention mechanism
PDF Full Text Request
Related items