| Multi agent cooperative decision-making algorithm can coordinate the resources and activities of multiple agents in multi-agent system to achieve the common goal.It has a broad application prospect in intelligent transportation system,smart grid system,wireless sensor network and UAV group cooperative control.Among them,the multi-agent deep reinforcement learning method based on Markov decision process has achieved good performance in solving the problem of robot cooperative control.The feature of this kind of method is that it has the ability of online learning based on the observation data update,and has good adaptability to the environment when applied in the end side.However,due to the limited computing power of the end-side embedded processor,there is a certain computational resource constraint compared with the complexity of the multi-agent deep reinforcement learning algorithm.Therefore,how to efficiently deploy these algorithms to the embedded processor platform is a hot issue in the field of embedded intelligent computing.Therefore,in this paper,based on the decision-making requirements of target assignment and path planning in the process of multi UAV autonomous cooperative ground attack,the design and implementation of multi-agent deep reinforcement learning algorithm based on zynq platform are carried out.The main research work is as follows.1.According to the requirements of cooperative path planning task for multi UAV ground attack,the overall design scheme and research ideas are proposed.The core idea is to complete the design of multi-agent cooperative decision algorithm combined with the background requirements and resource constraints of embedded processor platform,and then implement the distributed deployment of the algorithm based on zynq platform.2.In the aspect of multi-agent cooperative decision-making algorithm design,this paper takes the multi-agent cooperative decision-making task of multi UAV ground attack path planning as the demand background,analyzes its decision-making process,establishes the environment model,and designs the multi-agent deep reinforcement learning algorithm based on value decomposition to complete the multi UAV Ground Attack cooperative path planning task.In the algorithm design,the Multi-objective Assignment algorithm is used to assist the calculation of the reward function of the system to speed up the learning speed of the algorithm.At the same time,additional system rewards are set to prevent multiple UAVs from colliding with each other and entering the threat area.3,Based on the streaming architecture,a hardware accelerator of basic operator unit is designed,which has the ability of pipelining between layers and adjustable parallel parameters.Then,the distributed implementation of the algorithm is completed through the hardware accelerator call of arm,software and hardware co computing,and inter agent communication design.4.Based on the above research content,the verification platform of the algorithm implementation is built based on zynq processor,and the performance of the algorithm implementation is tested and analyzed.The verification results show that,in terms of algorithm performance,based on the algorithm designed in this paper and its embedded deployment method,the single step execution time of the algorithm is0.355 MS,and the single training reasoning time is 1.521 MS,which has certain advantages over the end side arm platform.At the same time,the algorithm is verified in the application scenario,and the results show that the algorithm implemented on the zynq platform can complete the cooperative decision-making task of multiple UAVs. |