| With the development of artificial intelligence,the use of artificial intelligence technology to solve the problem of underwater robot path planning has attracted more and more experts’ attention and research.Compared with the AUV path planning problem with known environmental information,the path planning problem under unknown environmental information faces greater challenges.Reinforcement learning is an important artificial intelligence method that can be interactively studied with the environment.Through continuous trial and error and exploration,the algorithm gradually learns relevant decision models,has flexible planning and obstacle avoidance capabilities,and can be used to solve the robot path planning problem in unknown environments.In this paper,the DQN reinforcement learning algorithm based on double neural network is used to solve the AUV local path planning problem.The training includes obstacle avoidance and obstacle boundary detour.Firstly,based on the value function network fitting Q value table,the target network is introduced,which solves the problem of "dimensionality disaster" based on Q-value table learning,and the correlation of the value function update is disturbed by the target network.Enhanced learning ability of the algorithm.Secondly,a memory pool experience playback method based on "stunning" is proposed.In the training process,the experience learned in the past is reviewed,the algorithm is used to remember the learning decision,and the correlation between samples is broken.Finally,according to the difference between obstacle avoidance and bypass tasks,two kinds of real-time evaluation functions are proposed respectively.Each step of AUV will obtain a good or bad evaluation value,which solves the sparse solution problem of reinforcement learning.The experiment proves the validity of the two evaluation functions.This paper carries out simulation experiments in the python development environment.In the experiment,the model training process is analyzed,and the self-learning ability of the model algorithm is proved.The model after training is adaptive and versatile,which can solve the problem of obstacle avoidance and bypass of unknown obstacles.At the same time,through comparison experiments,it is verified that the DQN algorithm with target network and the experience-first playback method based on "stunning" can improve the learning ability of the algorithm,and it can reduce the correlation between samples and improve the learning ability. |