| In recent years,with the maturity and rapid development of AI technology,the methods represented by deep reinforcement learning have made breakthroughs in the field of single agent,while there are many team collaboration tasks in real life,so researchers have gradually applied deep reinforcement learning to the field of multi-agent.The multi-agent cooperative coverage problem is one of the most critical problems in the multi-agent field.In the multi-agent cooperative coverage task,the agent not only observes its own local information,but also observes the global information of other agents,resulting in slow convergence speed of the agent,unstable algorithm models,and other issues.Aiming at the above multi-agent cooperative coverage problem,this paper mainly carried out the following research:First,to address the problems of low efficiency and slow learning speed in multi agent collaborative coverage environment exploration,this paper proposes a multi-agent cooperative exploration framework based on adaptive noise policy(ANPEF).Firstly,an adaptive noise policy network(ANPN)is proposed,which can adaptively and dynamically update parameters based on the number of iterations of the agent and make action decisions based on the observed state of the agent.Then,combined with the centralized trainingdecentralized execution framework,the multi-agent cooperative exploration framework ANPEF based on adaptive noise policy is proposed,which realizes the in-depth exploration of multi-agent complex environment and reduces the possibility of falling into local optimization.Finally,the effectiveness and universality of the proposed framework were studied,and the ANPEF framework was applied to MAAC and MADDPG algorithms,respectively.The ANPEF-MAAC and ANPEF-MADDPG algorithms were proposed to solve the problem of incomplete and unstable exploration of the environment by the multiagent cooperation coverage model.Then,to address the problems of the low utilization rate of multi-agent experience samples and the poor team rewards,this paper proposes an experience replay buffer mechanism algorithm based on meta-learning error classification(MECER).Firstly,an error classification experience replay mechanism algorithm(ECER)is proposed,which divides the experience pool into recommendation pool and common experience pool,and stores them separately according to the TD error of the experience sample as the importance standard.Then,a meta learning error classification experience replay mechanism MECER is proposed.which uses the meta learning idea as a starting point to dynamically adjust the sampling ratio parameters of the historical knowledge learning experience pool and improve the utilization rate of high-quality samples.Finally,MECER is applied to MAAC and MADDPG algorithms to solve the problems of insufficient utilization of experience samples and slow convergence speed in multi agent cooperative coverage tasks.Finally,two algorithms incorporating the ANPEF framework and MECER framework are implemented in the two cooperative coverage tasks of cooperative communication and cooperative navigation in the abstraction multi particle environment,and are compared with existing algorithms,and the ablation comparison experiments are conducted on ECER and MECER to verify the effectiveness of the algorithm in this paper. |