Font Size: a A A

Research On Multi-Agent Deep Reinforcement Learning Algorithm With Collision Times Constraints

Posted on:2024-07-12Degree:MasterType:Thesis
Country:ChinaCandidate:X D YuanFull Text:PDF
GTID:2542307151960659Subject:Computer Science and Technology
Abstract/Summary:PDF Full Text Request
An intelligent workshop is a highly automated site that completes tasks such as transportation and manufacturing through mutual collaboration between different agents.This thesis studies the application of multi-agent deep reinforcement learning in intelligent manufacturing workshops.In intelligent workshop,various constraints are usually formed by environmental and task requirements.Therefore,this thesis mainly studies how intelligent agents complete their transportation tasks under the constraints of collision frequency constraints.Firstly,this thesis abstractly models the handling task of intelligent manufacturing workshop,and constructs two constrained multi-agent system environments: Cooperative navigation and cargo handling.Secondly,in response to the problem of overestimation of the value function and fragile temperature parameters in maximum entropy reinforcement learning in multi-agent deep reinforcement learning,this thesis proposes a reinforcement learning algorithm MACDACA.Firstly,a double attention mechanism critic network is proposed,which reduces overestimation of the policy network by training two critic networks with attention mechanisms,thereby improving the output of the policy network and enabling the agent to perform better actions while avoiding collisions.Secondly,an adaptive entropy based reinforcement learning method is proposed,which adaptively adjusts temperature parameters based on the current policy entropy of each agent,dynamically adjusting the importance of policy entropy to the policy network for adaptive exploration,making the algorithm more robust.Finally,comparative experiments were designed in two environments to verify the effectiveness of the MACDAC-A algorithm.Finally,for the off-policy of actor-critic reinforcement learning in which the experience pool sample utilization is low and the considered environment generally makes certain constraints on the agents,this thesis proposes a reinforcement learning algorithm MACAAC-PCAL.Firstly,the rate efficiency of the experience pool sample is improved by conservative advantage learning,which mainly increases the gap between the optimal action and the other actions by reshaping the reward value of the agents,thus improving the sample utilization efficiency.Secondly,a penalty action selection approach is proposed to make the agent consider not only the reward of its task but also whether the constraint is satisfied when making an action,thus enabling the agent to better use the environment with constraints.Finally,the effectiveness of the MACAAC-PCAL algorithm is verified in both environments.
Keywords/Search Tags:Multi-agent Systems, Deep Reinforcement Learning, Attention Mechanisms, Information Entropy, Actor-Critic
PDF Full Text Request
Related items