Research On Multi-Agent Deep Reinforcement Learning Algorithm With Collision Times Constraints

Posted on:2024-07-12

Degree:Master

Type:Thesis

Country:China

Candidate:X D Yuan

Full Text:PDF

GTID:2542307151960659

Subject:Computer Science and Technology

Abstract/Summary:

PDF Full Text Request

An intelligent workshop is a highly automated site that completes tasks such as transportation and manufacturing through mutual collaboration between different agents.This thesis studies the application of multi-agent deep reinforcement learning in intelligent manufacturing workshops.In intelligent workshop,various constraints are usually formed by environmental and task requirements.Therefore,this thesis mainly studies how intelligent agents complete their transportation tasks under the constraints of collision frequency constraints.Firstly,this thesis abstractly models the handling task of intelligent manufacturing workshop,and constructs two constrained multi-agent system environments: Cooperative navigation and cargo handling.Secondly,in response to the problem of overestimation of the value function and fragile temperature parameters in maximum entropy reinforcement learning in multi-agent deep reinforcement learning,this thesis proposes a reinforcement learning algorithm MACDACA.Firstly,a double attention mechanism critic network is proposed,which reduces overestimation of the policy network by training two critic networks with attention mechanisms,thereby improving the output of the policy network and enabling the agent to perform better actions while avoiding collisions.Secondly,an adaptive entropy based reinforcement learning method is proposed,which adaptively adjusts temperature parameters based on the current policy entropy of each agent,dynamically adjusting the importance of policy entropy to the policy network for adaptive exploration,making the algorithm more robust.Finally,comparative experiments were designed in two environments to verify the effectiveness of the MACDAC-A algorithm.Finally,for the off-policy of actor-critic reinforcement learning in which the experience pool sample utilization is low and the considered environment generally makes certain constraints on the agents,this thesis proposes a reinforcement learning algorithm MACAAC-PCAL.Firstly,the rate efficiency of the experience pool sample is improved by conservative advantage learning,which mainly increases the gap between the optimal action and the other actions by reshaping the reward value of the agents,thus improving the sample utilization efficiency.Secondly,a penalty action selection approach is proposed to make the agent consider not only the reward of its task but also whether the constraint is satisfied when making an action,thus enabling the agent to better use the environment with constraints.Finally,the effectiveness of the MACAAC-PCAL algorithm is verified in both environments.

Keywords/Search Tags:

Multi-agent Systems, Deep Reinforcement Learning, Attention Mechanisms, Information Entropy, Actor-Critic

PDF Full Text Request

Related items

1	Research On Bidding Strategy Of Generators In Electricity Market Based On Asynchronous Advantage Actor-Critic Reinforcement Learning
2	Traffic Signal Priority Control Based On Multi-Agent Deep Reinforcement Learning
3	Research On Deep Reinforcement Learning Methods For Solving Flowshop Scheduling Problem
4	Research On Automatic Power Generation Control Based On Multi-agent Transfer Reinforcement Learnin
5	Study On Adaptive Pid Control Strategy Based On Actor-critic Learning
6	Research And Implementation Of Actor-Critic Algorithm Model For Aircraft Autonomous Landing
7	Actor-Critic Reinforcement Learning And Applications To Automatic Ship Berthing
8	Base On Deep Reinforcement Learning For Traffic Signals Timing
9	Research On Energy Management Strategy Of Microgrid With Distributed Energy Storages
10	A Study On Airport Parking Space Allocation Method Integrating Multi-Objective Evaluation Function