| In recent years,multi-agent reinforcement learning research has made remarkable progress and has been successfully applied to robot control,recommender systems,largescale games,and other fields.Despite the fruitful results in theory and practice,most of the existing algorithms have difficulties in achieving good results in complex cooperative multi-agent tasks.All of them have certain limitations.First,there are limitations in the existing algorithms for modeling the rapidly changing cooperative relations between agents in complex cooperative multi-agent tasks.Second,complex cooperative multiagent tasks are difficult to solve.Existing algorithms usually directly solve the entire task,which has limitations in the solution quality and efficiency.To address the above limitations and challenges,this paper focuses on the cooperative multi-agent reinforcement learning algorithm.From the perspectives of multi-agent cooperative relation modeling and role generation,this paper analyzes the limitations of existing algorithms and proposes improved methods.The main work and innovations are summarized as follows.· In terms of improving the cooperative relation of multi-agents,a multi-agent reinforcement learning algorithm based on weighted value decomposition is proposed from the perspective of implicit relation modeling.To address the problem that the main multi-agent value decomposition methods only consider the monotonicity constraint in the value decomposition process and ignore the influence of cooperative relation,we propose a cooperative relation weight learning mechanism based on the multi-scale agent graph convolution to quantify the influence of multi-agent cooperative relation on the training of the value decomposition process.Based on the weight learning mechanism,a weighted value decomposition model is designed and the weights are used to guide the decomposition process from joint action values to local action values of the agents to achieve an implicit relation modeling process that is useful for optimal policy solving.Empirical experiments on a public multi-agent reinforcement learning benchmark show that implicit relation modeling based on weighted value learning can significantly improve the performance of multi-agent policy learning.· In terms of explicit modeling of multi-agent cooperative relations,a multi-agent reinforcement learning algorithm based on spatio-temporal relation modeling is proposed.The main idea is to explicitly model cooperative relations between agents as weighted complete agent graphs,from which cooperative relation features are directly extracted and used in the process of policy learning.In order to alleviate the noise problem brought by the massive cooperative relations to learning,the learning mechanism of weights in the agent graph is designed to achieve efficient mining and utilization of high-value cooperative information.To adapt the weights to changes in real-time situations,a spatio-temporal cooperative relation feature extraction module is designed and embedded to dynamically generate relation weights according to the spatio-temporal situation.Based on the agent weighted relation graph,a multi-agent policy learning mechanism based on the graph convolutional network is designed to support the generation of efficient multi-agent cooperative policies.Extensive experiments are conducted in the public multi-agent reinforcement learning benchmark and results show that the relation modeling can effectively improve multi-agent cooperation.· In terms of multi-agent role generation,a multi-agent reinforcement learning algorithm based on role modeling is proposed.The main idea is to use the divideand-conquer method to decompose the complex multi-agent learning problem into multiple subtasks,and the solution process of each subtask is responsible for one role.In this method,the role assignment model is first designed,the historical stateaction features of the agents are clustered by clustering methods,and the clustering results are the generated roles.Based on the role assignment results,a graph convolutional network based role feature extraction model is designed to model the relation between agents at the role level and extract valuable role features.Finally,a role-driven multi-agent policy learning mechanism is designed to encode role features into role hidden features through a hidden feature network,and then generate the parameters of the policy network.In this way,the role information contained in the role features are embedded into the agent policy learning process to improve the efficiency of subtask solving.Experiments on the public multi-agent reinforcement learning benchmark show that dynamic role generation greatly improves the efficiency of solving subtasks corresponding to roles and the efficiency of policy learning for complex tasks.· Although agent role modeling achieves the decomposition of multi-agent tasks,it is necessary to further study the way to generate different coping strategies for different tasks.This paper proposes a multi-agent reinforcement learning algorithm for heterogeneous skills from the perspective of heterogeneous skill learning.By introducing the concept of skills,the hidden features of skills generated by environmental information are used as the result of skill representation,and then a skill selection model based on a deep reinforcement learning algorithm is designed.The skill selection model achieves a many-to-many selection process from agents to skills based on the state features of agents and skill hidden features.Meanwhile,mutual information is introduced into the learning process for diverse and discriminative skills.The optimization objective of mutual information is expressed as intrinsic rewards along with rewards from the environment are used in the skilloriented policy learning process.Experiments on two public multi-agent reinforcement learning benchmarks show that the algorithm achieves state-of-the-art performance in all tasks,demonstrating the effectiveness of learning policies based on diverse skills. |