| With the continuous progress of machine learning and swarm intelligence,multiagent collaboration has gradually been introduced into the field of artificial intelligence.The application of multi-agent collaborative systems involves many fields,such as UAV collaborative planning,robot collaborative control,etc.However,in the face of an increase in the number of agents and changes in task scenarios,existing collaborative strategies may not be effectively applied to new task scenarios.Due to the sparse interactive nature of reinforcement learning,the resource consumption required to relearn a cooperative policy for each new task is prohibitive.Therefore,rational use of policy knowledge in existing related tasks based on the idea of transfer learning in reinforcement learning is the key to accelerating the efficiency of cooperative policy learning among agents under new tasks.Most of the current transfer methods rely on the prior knowledge of human experts,and the applicable scenarios are relatively limited.Although the transfer method based on deep network is widely applicable,it is difficult to guarantee the policy learning effect in different tasks.For this reason,this topic aims at the multi-agent cooperative knowledge transfer problem,and optimizes the strategy transfer performance from the perspective of improving the general transfer method.The main work includes the following aspects:(1)Aiming at the problem of migration between isomorphic tasks where the task scene remains unchanged but the number of agents changes,a knowledge distillation method combined with domain separation network(DSN-KD)is proposed.In this method,the agent strategy network with good performance in the original task is used as the teacher model,and the output of the teacher model is corrected by using the domainseparated neural network structure as supervision information,which is used for the learning guidance of the agent under the new task.The method does not need to predesign or train complex state-action mappings,which can reduce the cost of transfer.The experimental results in the UAV monitoring scene and the UAV cooperative target point occupation based on the multi-agent particle simulation environment,the robot cooperative push box,the UAV cooperative target strike and the multi-agent cooperative material recovery and other scenarios show that DSN-The KD transfer method effectively improves the learning speed of the new task policy and improves the closeness of the policy model to the theoretical optimal policy in the actual task.(2)Aiming at the problem of migration between heterogeneous tasks with different task scenarios or different agents,an improved progressive neural network method(DSNPNN)is proposed.This method uses a Gaussian mixture model to fit the policy distribution of multiple different agents,and assigns a pre-training policy to each agent in the target task based on the value function.Finally,multiple pre-training policy models combine domain separation modules with multi-arm control.The progressive network of the agent passes the features to the target agent network to guide the learning process.The experimental results of different migration tasks in the scenarios of UAV cooperative dynamic target pursuit,UAV cooperative target point occupation and multi-agent cooperative material recovery show that the method can improve the rate of multi-agent cooperative system learning strategies under different tasks and The optimal performance of the strategy is improved. |