Font Size: a A A

Research On Multi-agent Collaboration And Formation Control Based On Deep Reinforcement Learning

Posted on:2024-05-12Degree:MasterType:Thesis
Country:ChinaCandidate:X WangFull Text:PDF
GTID:2568307064957809Subject:Computer Science and Technology
Abstract/Summary:PDF Full Text Request
With the industrialization of society and the rapid development of artificial intelligence technology,agents such as robots and drones have been widely used in military and civilian fields.Considering the limitations of a single agent in processing tasks,the distribution,redundancy and high efficiency of the multi-agent system make it an effective way to deal with complex tasks such as large scale and high timeliness.Formation control is a basic ability for multi-agent systems to complete tasks.Compared with the traditional formation control method,the method based on deep reinforcement learning can enable the agent to obtain the control policy through autonomous learning when the precise environment model and the specific parameters of the agent’s kinematics/dynamics model are not known.Therefore,this paper studies the problem of multi-agent collaboration and formation control based on deep reinforcement learning.The general research content includes:1.Traditional formation control methods need to know the precise environment model and agent kinematics/dynamics model in advance,but these models are usually difficult to obtain due to their own complexity and the existence of external environmental factors such as noise.Aiming at this problem,a multi-agent formation control method integrating leader-follower architecture and MADDPG is proposed.Specifically,a Markov decision process model based on leader-follower architecture is given for multi-agent formation control.Next,local observation representations and action representations are designed for the leader agent and follower agent.And according to the task of formation control,reward function is designed.Finally,the MADDPG algorithm is used to train the local motion policy of the leader and the follower simultaneously.Experimental results show that the proposed method allows the agent to learn a stable policy of formation control autonomously,which verifies the effectiveness of the proposed method.Compared with single-agent reinforcement learning algorithms such as DDPG,it can better handle the situation of sparse rewards.And the convergence speed of the model is faster,and the performance of the learned policy by the agent is better.2.A multi-agent formation control method based on DQ-MADDPG is proposed.Specifically,the DQ-MADDPG algorithm obtains two different Q values by using two critic networks with global information input but different initial values of specific network parameters.Use the smallest Q value to update,so as to avoid the overestimation of Q value,so that the agents can learn the optimal policy of formation control.In addition,the gradient clipping method is used to avoid the gradient explosion of the neural network during the training process.Experimental results verify the effectiveness of the proposed algorithm,and the effect is better than MADDPG algorithm.During the test,it showed a better formation control ability,reflecting the superiority and robustness of the DQ-MADDPG algorithm in the convergence speed of agent’s learning and the performance of the policy.3.Aiming at the phenomenon that the existing methods cannot make large-scale agent groups learn stable policy of formation control,a swarm formation control method based on mean field Q learning is proposed.Through the mean field theory,the interaction of N agents is abstracted into the interaction between two agents,which reduces the learning difficulty of the control policy of the agent.The experimental results verify the effectiveness of the proposed method.And the learned policy can be applied in large-scale swarm formation control and the swarm can maintain formation during movement.The advantage of this method is that it can train the agent to learn a stable policy of swarm formation control in a large group,and the learned policy can be applied in different group sizes,showing strong scalability.
Keywords/Search Tags:Multi-agent system, Deep reinforcement learning, Formation control, Swarm formation control
PDF Full Text Request
Related items