Research On Applying Deep Reinforcement Learning In Pursuit-evasion Problem

Posted on:2020-10-04

Degree:Master

Type:Thesis

Country:China

Candidate:S Y Huang

Full Text:PDF

GTID:2428330590483131

Subject:Control Engineering

Abstract/Summary:

PDF Full Text Request

As one of the most representative artificial intelligence algorithms,deep reinforcement learning combines the feature extraction ability of deep neural network and the exploration ability of reinforcement learning.Reinforcement learning provide samples for neural networks by exploring the environment,and neural network guides the exploring direction of reinforcement learning by training with the sample.Deep reinforcement learning provides a general frame for self-study agent,which makes end-to-end learning become possible.Because of containing characteristics of cooperation and competition,Pursuit-Evade problem has always been a classical problem in multi-agent field.In multi-agent problem,it has much more difficulty to design the parameters of agent controller if the mathematical model of agent is unknown.But deep reinforcement learning methods could let agent study its controller parameter by self-learning,which at the same time avoids the risk of adding subjective factor into the strategy of agent.We apply multi-agent deep reinforcement learning algorithm into pursuit-evade problem with limited space.Based on the defect of deterministic policy gradient in this environment,we present a new clipping different action space method named 'Multi-Agent Deep Deterministic Policy Gradient with Clipped Action Space'(MADDPG-DAS).Furthermore,the reward function and activation function are modified according to the pursuit-evade environment.Prioritized replay buffer and proximal policy optimization are also used to increase the accelerating convergence and stability.The experimental result indicated MADDPG-DAS algorithm could solve the misleading gradient problem caused by illegal action-value function in Deterministic Policy Gradient algorithm.Compared with MADDPG,both kinds of agents trained by MADDPG-DAS algorithm showed higher flexibility and had better performance in our environment.In addition,this paper tested the ultimate condition that evader can successfully escape.The result showed that evader could find the strategy more suited to its advantage in different situation,which provided a solution for similar ultimate condition problem.

Keywords/Search Tags:

Reinforcement Learning, Deep Learning, Deep Reinforcement Learning

PDF Full Text Request

Related items

1	Research On Security Deep Reinforcement Learning Based On Experiences
2	Research On Stock Trading Based On Deep Reinforcement Learning
3	Research On Group Confrontation Strategies Based On Deep Reinforcement Learning
4	Research And Implementation Of Stock Quantitative Trading Algorithm Based On Deep Reinforcement Learning
5	Research On Reinforcement Learning Based Control Method Of Magnetic Navigation AGV
6	Research On Applying Deep Reinforcement Learning In Pursuit-evasion Problem
7	Research On Applying Deep Reinforcement Learning In Image Based Control And Image Classification Tasks
8	Reaearch On Deep Reinforcement Learning Algorithm In Continuous Action On Space
9	A Research Of Deep Reinforcement Learning Algorithms In Combination With Multi-relations
10	The Research On The Application Of Deep Learning In Reinforcement Learning