Font Size: a A A

Research On Applying Deep Reinforcement Learning In Pursuit-evasion Problem

Posted on:2020-10-04Degree:MasterType:Thesis
Country:ChinaCandidate:S Y HuangFull Text:PDF
GTID:2428330590483131Subject:Control Engineering
Abstract/Summary:PDF Full Text Request
As one of the most representative artificial intelligence algorithms,deep reinforcement learning combines the feature extraction ability of deep neural network and the exploration ability of reinforcement learning.Reinforcement learning provide samples for neural networks by exploring the environment,and neural network guides the exploring direction of reinforcement learning by training with the sample.Deep reinforcement learning provides a general frame for self-study agent,which makes end-to-end learning become possible.Because of containing characteristics of cooperation and competition,Pursuit-Evade problem has always been a classical problem in multi-agent field.In multi-agent problem,it has much more difficulty to design the parameters of agent controller if the mathematical model of agent is unknown.But deep reinforcement learning methods could let agent study its controller parameter by self-learning,which at the same time avoids the risk of adding subjective factor into the strategy of agent.We apply multi-agent deep reinforcement learning algorithm into pursuit-evade problem with limited space.Based on the defect of deterministic policy gradient in this environment,we present a new clipping different action space method named 'Multi-Agent Deep Deterministic Policy Gradient with Clipped Action Space'(MADDPG-DAS).Furthermore,the reward function and activation function are modified according to the pursuit-evade environment.Prioritized replay buffer and proximal policy optimization are also used to increase the accelerating convergence and stability.The experimental result indicated MADDPG-DAS algorithm could solve the misleading gradient problem caused by illegal action-value function in Deterministic Policy Gradient algorithm.Compared with MADDPG,both kinds of agents trained by MADDPG-DAS algorithm showed higher flexibility and had better performance in our environment.In addition,this paper tested the ultimate condition that evader can successfully escape.The result showed that evader could find the strategy more suited to its advantage in different situation,which provided a solution for similar ultimate condition problem.
Keywords/Search Tags:Reinforcement Learning, Deep Learning, Deep Reinforcement Learning
PDF Full Text Request
Related items