| With the rapid development of industries such as intelligent logistics and intelligent manufacturing,intelligent warehouses are also widely used in various fields.In an intelligent warehouse,robots need to rely on algorithms to plan the shortest path from the starting point to the target point,and avoid collisions during the journey.Therefore,the research on the path planning method of intelligent warehouse robots is of great practical importance to improve the efficiency of intelligent warehouses.At present,most of the traditional path planning algorithms rely on known static map information,and can only plan a path from a fixed starting point to a fixed end point,which is difficult to adapt to the dynamic environment and tasks in the intelligent warehouse.The method based on deep reinforcement learning conducts autonomous learning and decision-making through the interaction between the robot and the environment,so that the machine has the ability of autonomous learning and autonomous application,but it also has some shortcomings,such as slow convergence speed,low exploration efficiency,and difficult training.Aiming at the robot path planning problem in two special scenarios of the intelligent warehouse,this thesis proposes corresponding improved deep reinforcement learning algorithm model.The main work is as follows:(1)Aiming at the special scenario where the task target point of a single robot changes dynamically and there are dynamic obstacles in an intelligent warehouse,a PPOGRU algorithm is proposed.On the basis of the original PPO algorithm,to solve the problem of slow convergence speed in partially observable situations,the GRU module is added to the algorithm policy network to improve the learning ability of sequence sample data,thereby improving the training efficiency of the algorithm;for the problem of inefficient robot exploration due to sparse environmental rewards,a non-sparse external reward function is redesigned,and in addition,a curiosity module is introduced to generate internal rewards To encourage robot exploration,the sum of external and internal rewards is used as the total reward to guide the robot to make action decisions.Through comparative experiments in different scenarios and maps of different sizes,it is proved that the PPO-GRU algorithm has faster convergence and higher path planning success rate than the original PPO algorithm and other deep reinforcement learning algorithms;and the algorithm has shorter running time and better path planning results than traditional path planning algorithms.(2)Aiming at the special scenario where multiple robots need to cooperate to complete multiple tasks in an intelligent warehouse,an APF-MAPPO algorithm is proposed.To address the problem that it is difficult to represent the environmental state when there are multiple target points for multiple robots,the environmental potential field map is constructed through APF as the input of the environmental state representation,and the environment provided by APF state information is used to improve the learning efficiency of robots.In addition,for the problem that the high-dimensional state space generated by multiple robots makes algorithm training difficult,the algorithm value network structure is improved by combining the self-attention mechanism,so that the algorithm can pay more attention to the information that is beneficial to path planning when evaluating the state value.By completing comparative experiments in maps of different sizes,it is proved that the APF-MAPPO algorithm has a higher average reward value and task completion than the original MAPPO algorithm,and the path planning effect is better;and the algorithm has shorter running time and better path planning results than traditional path planning algorithms. |