In recent years,the rapid development of deep learning has injected new vitality into the field of reinforcement learning,providing new avenues for achieving complex tasks.Most current research in deep reinforcement learning focuses on single-agent systems.However,in the real world,the completions of many tasks require the cooperation of multiple agents.In a multi-agent system,the policies and actions of each agent can affect the states of other agents,resulting in non-stationarity,partial observability,and the curse of dimensionality,which present significant challenges to collaboration among multiple agents.Therefore,it is crucial to explore how to properly guide the collaboration between intelligent agents to accelerate task completion,and based on this,promote the exploration of the environment by multi-agent systems.This article mainly studies multi-agent cooperation methods based on deep reinforcement learning.The main research content of the paper includes:(1)A multi-agent reinforcement learning collaboration method based on altruistic rewards is proposed.This method aims to encourage agents to behave in ways that benefit other agents,namely altruistic behavior,thereby promoting cooperation among agents.Specifically,the altruistic behavior of an agent are defined as actions that can change the future expectations of other agents in the environment towards a better direction,and then calculate the degree of goodness of this change through counterfactual reasoning.When an agent performs altruistic behavior,it receives an altruistic reward.Experiments are conducted in multiple collaboration scenarios of varying complexity,and the results show that compared to other baseline methods,this method can better promote the collaboration of agents,and the agents can obtain greater rewards in the same amount of time.(2)Current multi-agent reinforcement learning methods have shortcomings in exploration.These methods typically employ greedy policies,which are likely lead to inadequate exploration of the environment by the agents.Additionally,the exploration rate of all agents in the same environment is usually set to the same value,making the agents insufficiently adaptable to dynamic changes in the environment and the behaviors of other agents in complex tasks.To address these issues,a multi-agent adaptive exploration method based on deep reinforcement learning is proposed.Different methods are used to adjust the agents’ exploration at two different time scales: at each time step,the policy entropy of the agents is calculated,and actions are chosen probabilistically based on the size of the policy entropy or the state-action value;the agents’ exploration rate is adjusted by referencing the performance and exploration degree of other agents in the environment over a period of time.The proposed method is tested in multiple experimental scenarios in Star Craft II,and it is shown that the proposed method promotes exploration of the environment by the agents,allowing them to discover and learn better strategies and ultimately achieve greater rewards. |