Font Size: a A A

Research On AGV Storage Path Planning Based On Reinforcement Learning

Posted on:2022-07-02Degree:MasterType:Thesis
Country:ChinaCandidate:H LiuFull Text:PDF
GTID:2518306566990709Subject:Control Engineering
Abstract/Summary:PDF Full Text Request
With the rapid development of flexible manufacturing systems and intelligent warehousing systems,the application demand for Automated Guided Vehicles(AGV)is increasing.AGV is an automatic navigation device equipped with electromagnetic or optical sensors,which can load cargo and travel along a prescribed navigation path.In order to improve the efficiency of AGV operations and reduce transportation costs,this thesis proposes an AGV path planning method based on reinforcement learning for the storage environment.For single AGV path planning,a reinforcement learning-ant colony algorithm is proposed.Aiming at the shortcomings of ant colony algorithm in some situations,the idea of reinforcement learning is added to ant colony algorithm.The main contributions are as follows.First,the Q value is added into the state transition formula of ant colony algorithm to improve the selection probability of the optimal path node.Second,some poor nodes are punished to avoid ants choosing the poor path.Third,the path length and cumulative reward of each generation of ants are integrated into the comprehensive optimal local path,and the path pheromone is strengthened to improve the accuracy of the optimal path the ability to explore.The effectiveness of the proposed method is verified by comparing with the simulation results of the ant colony algorithm.For multi-AGV path planning,multi-agent reinforcement learning is used to optimize multi-AGV collaboration,and a cooperative task-oriented multi-agent reinforcement learning algorithm-WRFMR(Weighted Relative Frequency of obtaining the Maximal Reward)is proposed.The main contributions are as follows.First,the WRFMR algorithm requires each agent to estimate the Q function of their own actions,and does not need to estimate the Q function of joint actions,thus alleviating the problem of exponential growth of joint action space.Second,the algorithm uses weighted parameters and action probability to balance exploration and exploitation,so as to accelerate the convergence to the optimal joint action.The iterative method is used to estimate the frequency of obtaining the maximum reward,which reduces the space complexity of the algorithm.In this thesis,a mathematical model of learning process of WRFMR in cooperative repeated game is established,and the dynamic characteristics of the model are studied.The following conclusions are obtained.If the constituent actions of each optimal joint action are unique,then each optimal joint action is an asymptotically stable equilibrium point.In this thesis,we compare WRFMR algorithm with other MARL algorithms in two multi-agent cooperative tasks,box pushing task and DSN task,and verify the effectiveness of WRFMR algorithm.Finally,this thesis studies and builds a multi-AGV storage simulation environment.In this environment,the WRFMR algorithm is compared with other MARL algorithms.The results show that the WRFMR algorithm has a good performance in the multi-AGV storage system.In addition,this thesis uses Flexsim to visualize the learned strategy,further verifying the effectiveness of the WRFMR algorithm.
Keywords/Search Tags:reinforcement learning, multi agent reinforcement learning, path planning, ant colony algorithm
PDF Full Text Request
Related items