Research Of Deep Reinforcement Learning In Real-Time Strategy Games

Posted on:2020-11-03

Degree:Master

Type:Thesis

Country:China

Candidate:X X Shen

Full Text:PDF

GTID:2428330575998464

Subject:Computer Science and Technology

Abstract/Summary:

In recent years,deep learning has not only promoted the development of image detection,speech recognition and natural language processing,but also made new breakthroughs in research results in the field of reinforcement learning.After the development of deep reinforcement learning,agents have achieved results beyond the human level in video games,"Go chess expert" Alphago and Alphazero also showed super talent and defeated human Go experts.However,deep reinforcement learning begins to encounter bottlenecks in the application of more complex real-time strategy games.The main problems it faces include the following two aspects:First,in the self-behavior decision-making of agents,the action value function is unstable in the deep reinforcement learning algorithm based on value function iteration.Second,the problem of cooperation and competition among multiple agents.According to the above two aspects,the main work of this thesis is divided into the following t'wo parts.(1)The algorithm of exponential moving averaging triple Q-network is proposed.this thesis analyzes the action value function in deep reinforcement learning,analyzes the instability characteristics of action value function in deep Q-network algorithm,according to the averaged deep Q-network algorithm,the algorithm of exponential moving averaged Q-network algorithm is further proposed.Furthermore,this thesis analyzes the role deterministic policy gradient algorithm and deep Q-network algorithm in the deep deterministic policy gradient algorithm.At the same time,the algorithm of exponential moving averaging triple Q-network is proposed in the deep deterministic policy gradient algorithm.The core of the algorithm is to introduce the exponential moving averaged Q-network and improve the gradient formula of policy update in the deep deterministic policy gradient algorithm.The robustness of the agent behavior policy is increased.(2)Propose a Q-network based on self-attention mechanism.When multi-agents cooperate and compete,learning to pay attention to the characteristics of more valuable agents or the state of the game environment will help improve the behavior policies of multi-agents.Through the understanding and discussion of the application of attention mechanism in natural language processing,this thesis proposes a Q-network based on self-attention mechanism to optimize the behavior policies of multiple agents,so that multiple agents can perform better.Applying the exponential moving averaging triple Q-network algorithm proposed in this thesis,the agent obtains higher benefits than the original algorithm in multiple deep reinforcement learning environments with continuous action space.In the simulation environment that includes both cooperation and competition,the multi-agent deep deterministic policy gradient algorithm uses the Q-network model based on the self-attention mechanism obtains higher returns.The two improved algorithm applications are not limited to the simulation environment or the game but also the real life scene,and improve the behavior policies of agents and get more rewards.

Keywords/Search Tags:

Deep reinforcement learning, Real-time strategy games, Deep deterministic policy gradient algorithm, Exponential moving averaging, Agent, Action value function, Q-network, Self-attention mechanism

Related items

1	Research On Agent Decision-making And Control Based On Deep Reinforcement Learning
2	Research On Multi-agent Distributed Cooperation Method Based On Deep Reinforcement Learning
3	Research On Multi-agent Deep Reinforcement Learning In Non-globally Knowable Environment
4	Study Of Robot Arm Control Based On Deep Reinforcement Learning
5	Research And Application Of Twin Delayed Deep Deterministic Policy Gradient Algorithm
6	Research On Multiagent Cooperation And Applications Based On Reinforcement Learning
7	Deep Deterministic Policy Gradient Based On Entropy Regularization And Regular Update
8	Improvement And Application Of Deep Reinforcement Learning Based On Experience Replay Mechanism
9	Research On Improvement Method Of Experience Playback Mechanism In Deep Reinforcement Learning
10	Research On Fast Policy Gradient Algorithms Of Reinforcement Learning Based On Adaptive Learning Rate