| With the rapid development of science and technology,mobile robot technology has made breakthroughs.As a key point in robotics,path planning refers to making the robot reach the target point from the starting point without colliding with any obstacles in the process.The traditional path planning method has a dependence on the environment map,does not have the learning ability,and the adaptability is poor,so it is difficult to play a practical role in the unknown environment.The application of deep reinforcement learning to robot path planning is a frontier direction in the field of robotics.Aiming at the problem of robot path planning in the case of no map,this paper studies a mapless path planning solution based on deep reinforcement learning method,which mainly includes the following contents:Firstly,the training process of most robot navigation models based on Actor-Critic framework requires a large number of reward values,and the instability of reward values leads to high variance in the policy gradient estimation.To solve this problem,this paper designs a robot navigation model based on proximal policy optimization algorithm(PPO).By introducing the generalized advantage estimation(GAE),the hyperparameters are dynamically adjusted to balance the variance and bias,so that we can flexibly handle the training of the PPO navigation model in different situations,and effectively improve the navigation performance of the robot.In order to solve the problem that PPO navigation model can only use a small amount of robot state information,this paper designs a Lstm-Critic network,which uses the feature of Long Short-Term Memory(LSTM)neural network with long-term memory function to encode the historical state information of the robot.The LSTM-Critic network enables the robot to refer to the previous state information when planning the path,so as to improve the robot navigation performance.Secondly,in order to solve the problem of effective exploration of robots in unknown environment,this paper introduces the curiosity mechanism.However,the existing curiosity mechanism has some limitations:the curiosity mechanism cannot remember and store the previous state of the robot,resulting in the lack of timing of the model,which has a certain impact on the convergence rate of the navigation model.In this paper,the limitations of the curiosity mechanism are improved,and the improved curiosity mechanism is used to better motivate the robot to visit the new location.The module predicts the state of the next moment according to the current state and action,and its prediction error is used as the internal reward,through which the mobile robot can effectively explore the unknown environment.Finally,ROS robot operating system and Gazebo simulation software are used to create a robot model and build a simulation environment,and the effectiveness of the proposed algorithm is verified by experiments. |