Font Size: a A A

Research On Multi-Agent Collaboration Based On Value Decomposition And Proximal Policy Optimization

Posted on:2024-03-27Degree:MasterType:Thesis
Country:ChinaCandidate:Y H MaFull Text:PDF
GTID:2568307136989509Subject:Control Science and Engineering
Abstract/Summary:PDF Full Text Request
With the progress of technology,the problem-solving ability of a single agent can no longer meet people’s needs,and multi-agent systems have become an important research area.Collaborative coordination of multiple agent can accomplish more complex tasks and has important research significance.Multi-agent collaboration is an important research content of multi-agent systems,and the development of deep reinforcement learning has provided new methods for the research of multiagent collaboration,and many related research results have appeared.However,the existing methods such as the policy gradient method and the value function method still have shortcomings in solving the credit assignment problem of optimal policy learning for multi-agent collaboration,which affects the improvement of multi-agent collaboration capability.In order to better solve the complex credit assignment problem of multi-agent collaboration and adapt to larger scale multi-agent collaboration scenarios,this thesis conducts a multi-agent collaboration research based on value decomposition and proximal strategy optimization under the framework of centralized training and distributed execution,and the main works are as follows:(1)To address the difficulty of the existing policy gradient method for the credit assignment of multi-agent collaboration,a multi-agent collaboration policy learning model and algorithm based on proximal policy optimization with value decomposition which based on the advantage individual maximum principle is proposed.An individual critic is assigned to each agent to estimate the individual state values and advantage values of the agents,which are integrated into the joint state action values of the team through a mixing network,and through training,the agents can get different individual state action values and implicitly do the credit assignment.The simulation experiments of multi-agent predator-prey task show that the proposed model and algorithm can effectively improve the collaboration ability among multi agents.(2)To address the problem of complex credit assignment and the difficulty for the algorithm to obtain the optimal policy,a multi-agent collaborative policy learning model and algorithm based on weighted value decomposition and proximal policy optimization are proposed.By learning a target critic that is not limited by the network structure to improve the estimation of the real joint state action values,using the target critic to judge the critic of the value decomposable,and focusing on the underestimated joint actions through weighting design,the optimal actions are avoided to be missed while solving the credit assignment problem.Simulation experiments of the complex collaborative policy multi-agent predator-prey task show that the proposed learning model and algorithm can enable multi agents to effectively obtain a better collaborative policy with a wider range of applicable scenarios.(3)In order to improve the adaptability of the model and algorithm to collaborative scenarios with a larger number of agents,a complex input information processing model based on a multi-head attention mechanism is constructed.Information from different agents is analyzed using multiple different attention heads,and different weights are assigned to the agents for task-specific information integration to focus on more important information and reduce data dimensionality.Simulation experiments of multi-agent predator-prey tasks with increasing number of scales show that the complex input information processing model with attention mechanism introduced in this paper can effectively improve the collaboration ability among a larger number of agents and make the multiagent collaborative policy learning model and algorithm adaptable to larger scale multi-agent systems.
Keywords/Search Tags:Multi-agent Collaboration, Deep Reinforcement Learning, Credit Assignment, Value Decomposition, Proximal Policy Optimization, Attention Mechanism
PDF Full Text Request
Related items