Font Size: a A A

Research On AGV Path Planning Based On Cooperative Multi-agent Reinforcement Learnin

Posted on:2024-05-17Degree:MasterType:Thesis
Country:ChinaCandidate:D Y LiaoFull Text:PDF
GTID:2568307148962679Subject:Electronic information
Abstract/Summary:PDF Full Text Request
As the level of industrial automation increases,Automated Guided Vehicles(AGVs)have become one of the key pieces of equipment in the logistics industry,widely used for material and product handling.Rational planning of AGV travel routes can effectively improve the operational efficiency of materials.Considering the advantages of reinforcement learning’s autonomous learning,it can improve the efficiency of AGVs in complex dynamic environments.Therefore,this thesis proposes an AGV path planning method based on multi-agent reinforcement learning.The main work includes:Addressing the issue of the value decomposition algorithms QMIX and QTRAN,which cannot balance training speed and stability,this thesis proposes a multi-agent deep reinforcement learning algorithm called QTRAN Plus.The algorithm improves upon QTRAN by using a hybrid network to replace the sum of each agent’s Q-value networks in QTRAN,thereby enhancing the network’s approximation capability and optimization ability.A new loss function is proposed for training the hybrid network and all agent’s Q-value networks to improve convergence speed.Simulation validation and ablation experiment results show that QTRAN Plus outperforms other algorithms in robot cooperative handling tasks.In traditional tabular Q-learning,finding the action with the maximum Q-value requires traversing and comparing each Q-value action in the Q-table,which is computationally expensive.To address this issue,this thesis proposes a multi-agent reinforcement learning algorithm based on an improved tabular Q-learning called T2 Q.T2Q employs a centralized training-decentralized execution framework and reduces computational complexity by improving the traversal operation through storing the two highest Q-values for each state,thereby enhancing the efficiency of the algorithm in a research-oriented context.Theoretical analysis proves that T2 Q is superior to traditional tabular Q-learning in terms of computational complexity and convergence speed.Simulation experiments show that T2 Q achieves a 100% success rate converging to the optimal joint policy on both platforms.Finally,a multi-AGV warehouse simulation platform is designed and developed to verify the effectiveness of the proposed reinforcement learning methods in multi-AGV path planning problems.Simulation results show that,compared to QMIX and VDN algorithms,the proposed T2 Q and QTRAN Plus algorithms can converge to the optimal policy more quickly.Additionally,the learned policies are visualized using the logistics simulation software Flexsim,providing intuitive validation of the algorithms’ optimality.
Keywords/Search Tags:automated guided vehicle, path planning, multi agent reinforcement learning, multi agent deep reinforcement learning
PDF Full Text Request
Related items