| With the continuous development of science and technology,the low cost and convenience of UAV technology make it widely used,and it has become the key research content of strategic goals of various countries.UAV path planning is an important part of UAV technology.In UAV path planning tasks,traditional A* algorithm and artificial potential field method are classical algorithms.Besides,some intelligent optimization algorithms,such as genetic algorithm and ant colony algorithm,are also used in UAV path planning.In the face of some unknown environments,the performance of classical algorithms is not ideal,while the intelligent optimization algorithm faces the problems of complex model,large amount of calculation and difficult path planning to meet the real-time requirements when dealing with complex map environments.In view of these shortcomings,this paper proposes corresponding reinforcement learning algorithms to improve the effect of path planning for single UAV scene and more complex multi-UAV cooperative task scene.The research work mainly consists of the following two aspects:(1)Aiming at the path planning problem of single UAV,the factors such as mission objectives,environmental perception,constraints and so on are analyzed firstly,and then the reward function is designed based on the deep reinforcement learning algorithm D(Double)DQN combined with the idea of artificial potential field method,and the APF-DDQN model is proposed.This model improves the overestimation of Q value in DQN algorithm,and at the same time,the reward function designed by the idea of gravity field and repulsion field in artificial potential field method improves the problems of slow model convergence and inability to explore new States caused by sparse reward of environmental feedback in reinforcement learning algorithm.On the basis of the above research,this paper develops a UAV simulation environment for training reinforcement learning algorithm based on the framework of OpenAI Gym.The simulation results show that the APF-DDQN model proposed in this paper has faster convergence speed and shorter average step size for each task compared with DDQN model.Compared with APF model,APF-DDQN model has increased the number of successful tasks from 68 to 99 in 100 tests,which improves the local minimum problem of APF model.(2)Aiming at the problem of multi-UAV multi-task path planning,the APF-MAA2 C model is also proposed by combining the artificial potential field method with the multi-agent algorithm MA2 C.In this model,the dominance function is used as an index to select the action value and measure the draw value of all actions.At the same time,APF can guide the training of MAA2 C algorithm and speed up the convergence of the model.Because multi-UAV path planning is a collaborative planning problem,when using APF-MAA2 C model,it is also necessary to consider the state,action and reward feedback in the environment of other UAVs.In order to guide UAVs to avoid other UAVs,a repulsive force field generated by other UAVs is added in the environment as a punishment reward.Experiments show that compared with MA2 C model,APF-MAA2 C model has increased the number of successful missions from 68 times to 99 times in 100 tests,which can better avoid possible collisions between multiple UAVs and ensure successful arrival at the mission site to complete the mission. |