Font Size: a A A

Research On Multi-agent Cooperative Confrontation Method Based On Deep Reinforcement Learning

Posted on:2021-09-29Degree:MasterType:Thesis
Country:ChinaCandidate:C DuFull Text:PDF
GTID:2518306050965569Subject:Detection Technology and Automation
Abstract/Summary:PDF Full Text Request
Deep reinforcement learning,as a new type of artificial intelligence technology,has achieved rapid development in recent years.When deep reinforcement learning is applied to multiagent games,not only the conditions for the stability of the reinforcement learning environment are broken,but also the cooperation and competition relationship between agents,the multi-agent decision-making and motion planning are increasingly the focus of research.Aiming at the above pain points,this subject provides a complete solution from environment perception,motion planning to decision confrontation.For each module,the corresponding algorithms and implementations are proposed,specifically: situational awareness algorithm based on point cloud matching,motion planning algorithm based on deep reinforcement learning,and multi-agent decision confrontation algorithm based on fusion of imitation learning and reinforcement learning.First,the situational perception algorithm of point cloud matching can obtain not only the obstacle information,but also the global coordinates of the agents,providing the ability to perceive key information for navigation and decision making,and then based on the motion of deep reinforcement learning,the planning algorithm provides obstacle avoidance and point-to-point navigation capabilities for each agent.Finally,the multi-agent cooperative adversarial decision-making algorithm combining imitation learning and reinforcement learning is provided to provide the optimal solution for the multi-agent mixed game.The main work and contributions of this article are as follows: 1.This paper proposes a situation perception algorithm based on point cloud matching in a multi-agent game.The algorithm first establishes an environment vector map,and obtains the simulated lidar point cloud set under the environment vector map based on the initial position information of the lidar,and then matches the simulated point cloud set with the physical lidar point cloud set.Then,based on the precise positioning of the lidar,the optimal simulated lidar point cloud set is generated,which is subtracted from the physical lidar point cloud.By processing the difference of the point cloud,the position information of the remaining agents in the lidar field of view can be obtained.Finally,the model is deployed in the ROS environment for engineering applications.Experimental results show that the proposed situational perception algorithm based on point cloud matching can not only achieve situational awareness in a multi-agent game environment,but also optimize the positioning of the agent itself to meet the requirements of real-time and stability.2.In order to meet the needs of navigation and obstacle avoidance,an end-to-end agent motion planning algorithm based on deep reinforcement learning is proposed.The agent's lidar original point cloud and positioning information are used as the input of the network,and the speed of agent is used as output.The point cloud information of the lidar is first compressed and extracted by the convolutional network,and then it is integrated with highdimensional features such as location and sent to the subsequent network for processing.As a high-dimensional feature,the navigation destination is input into the network as a gated signal to highlight the decisive role of high-dimensional features in navigation.Firstly,an omnidirectional mobile robot is used as a navigation model,and tensorflow is used for network training under the ROS platform,and then the trained network is deployed on a real vehicle.Experiments show that the proposed network model is in a simulation environment and a real environment Both achieve point-to-point navigation without encountering any obstacle information in the middle.Compared with the algorithm in the Move Base navigation package under the ROS environment,the control frequency and navigation continuity have been greatly improved.3.In the field of multi-agent mixed game decision-making,aiming at the difficulty of training multi-agents from scratch by the MADDPG algorithm;designing the reward function has no basis to follow.This paper proposes a multi-agent mixed game decision algorithm : GAIL-MADDPG.The agent first performs imitation learning based on the expert strategy,and then strengthens it after acquiring basic skills.In this way,the convergence speed of the algorithm is accelerated;at the same time,the discriminator provides the reward function for the agent.The prototype can solve the problem that the MADDPG algorithm manually designs the reward function of multi-agents without evidence.Finally,the Robomaster2019 artificial intelligence challenge competition is used as a platform to verify the deployment algorithm.Compared with the MADDPG algorithm,the GAIL-MADDPG algorithm proposed in this paper not only does not need to manually design the reward function,but also improves the convergence speed by more than twice.
Keywords/Search Tags:Multi-Agent System, Deep Reinforcement Learning, Environment Awareness, Motion Planning, Decision Confrontation
PDF Full Text Request
Related items