Font Size: a A A

Research On Reinforcement-learning-based Multi-robot Cooperative Path Planning Algorithm In The Warehouse

Posted on:2023-12-27Degree:MasterType:Thesis
Country:ChinaCandidate:M Q ShiFull Text:PDF
GTID:2558307118499464Subject:Software engineering
Abstract/Summary:PDF Full Text Request
Due to the increasing demands for logistics in warehouses caused by the rapid development of online shopping,time-efficient algorithms for robot path planning are key to improve the efficiency of intelligent warehousing systems.Existing methods for single-robot path planning,which rely on static information of the environment,frequently push and pop nodes from the list,resulting in the reduction of efficiency.Besides,they can’t deal with collisions between multiple robots,which doesn’t meet the requirements of warehouse environments.Reinforcement-learning-based methods feed local observations to the network,so that the agent can learn a policy for collision avoidance and navigation tasks by interacting with the environment.However,it takes long time to converge for lack of global information.To solve the above problems,this thesis first proposes a time-efficient single-robot path planning method.By modeling the layout of the warehouse,a collision-free global path of the robot is calculated.Then the global path information is used for calculating the global path observation and offpath penalty.In addition,the repulsion and the target attraction in the social force model are used to calculate the behavior vector of the robot,which will be fed to the network.Adopting the above methods,this thesis trains a policy in warehouse environments that can complete navigation task while avoiding collisions and be extended to multiple robots.The main research and works include the following parts:(1)Aiming at the problem that the searching efficiency reduces due to frequent pushing and popping nodes from the list,a path planning algorithm based on the warehouse layout is proposed,which uses the layout information of the warehouse to calculate the cost and quickly plans a global path for a robot.General search-based single-robot path planning methods are not designed according to the characteristics of the warehouse environment.Frequent pushing and popping nodes from the list during the search increase the search time.This thesis first analyzes and models the shelves layout in the warehouse,and proposes an algorithm that can automatically identify the layout of the warehouse according to shelves layout.Then,it defines the extra selected nodes,and analyzes the reasons for the generation of extra selected nodes during the search.A new cost function is proposed to calculate the cost according to the layout information,reducing the generation of extra selected nodes.Finally,this thesis further proposes warehouse-layout-aware A*(WLAA *)algorithm to calculate the path.5000 experimental simulations show that the searching speed of WLAA* in the warehouse is 7.5 times faster than existing methods.(2)Aiming at the problem that the single robot path planning algorithm can’t deal with collisions between multiple robots,this thesis uses the reinforcementlearning-based method to train the policy for cooperative collision avoidance and navigation task,and improve the policy’s performance by calculating the global path observation and the off-path penalty using global path information from WLAA* integrated with the social force model.Single-robot path planning algorithms can’t deal with collisions between multiple robots.Therefore,this thesis first models the problem of multi-robot cooperative path planning in the warehouse as a multi-agent partially observable Markov decision process,and uses asynchronous advantage actor-critic(A3C)algorithm to train a policy that can complete navigation task while avoiding collisions.Aiming at the problem that it takes robots a long time to learn an effective policy,the global path observation and the off-path penalty is calculated using global path information from WLAA* so that the robot can learn to walk along the global path,avoiding aimless exploration as much as possible.Besides,the concepts of the pedestrian repulsion and the target attraction in the social force model are introduced to calculate the behavior vector of the robot,which will be fed to the network.Finally,a policy that can achieve cooperative collision avoidance and navigation is trained.Simulation results in warehouse environments with different scales show that the policy trained by the proposed method can be deployed to multiple robots who use their own policies to complete collision avoidance and navigation tasks,and has a higher success rate than the existing method.
Keywords/Search Tags:multi-robot in the warehouse, path planning, reinforcement learning, global planning, local planning
PDF Full Text Request
Related items