Font Size: a A A

Research On Training And Decision-making Of Multi-agent Deep Reinforcement Learning

Posted on:2024-06-05Degree:MasterType:Thesis
Country:ChinaCandidate:D XuFull Text:PDF
GTID:2568307157983329Subject:Master of Electronic Information (Professional Degree)
Abstract/Summary:PDF Full Text Request
With the development of artificial intelligence technology,Multi-Agent Deep Reinforcement Learning(MADRL)is increasingly being used to solve practical problems such as autonomous driving,traffic scheduling,power system optimization,etc.,and therefore,it has gained more attention.However,there are still some issues in Multi-Agent Deep Reinforcement Learning that limit its performance.In value function decomposition-based Multi-Agent Deep Reinforcement Learning,the max operator is typically used to compute the target Q-values,and the Q-function is approximated by neural networks with random errors.This can lead to the problem of biased estimation of target Q-values during training.Additionally,in Multi-Agent Deep Reinforcement Learning,the non-stationarity of the environment and the randomness in sample transitions can cause the policy network or action-value network of agents to fail in certain states,thereby affecting the robustness of agent decision-making.Addressing these issues is of significant importance for improving the performance of multi-agent systems.To address the problems of biased estimation of target Q-values during multi-agent training and insufficient decision-making robustness of multi-agents,this study conducts the following research:(1)To tackle the issue of biased estimation of target Q values during multi-agent training,a Multi-Agent Deep Reinforcement Learning method based on target Q value adjustment mechanism(QADJ)is proposed.This method designs upper and lower bounds for the target Q-values to identify biased target Q-values individually.A adjustment calculation formula is devised to calculate the adjustment amount for target Q-values with different degrees of bias.Two bias control methods are designed to prevent the upper and lower bounds from deviating too far from the target Q-values and losing their constraint effect.The overall idea of this method is to identify target Q-values with bias using upper and lower bounds,calculate the adjustment amount using the adjustment calculation formula,and adjust them based on the calculated adjustment amount.By adjusting the biased target Q-values,the multi-agent system can achieve more accurate training.Finally,a comparison experiment is conducted on five different multi-agent experimental environments with five baseline methods.The results show that the multi-agent system using this method achieves the highest team reward or win rate in all experimental environments and increases the win rate from 70% to 90%,effectively addressing the problem of biased estimation of target Q-values.(2)To address the insufficient decision-making robustness of multi-agents,a decision optimization method based on ensemble learning for Multi-Agent Deep Reinforcement Learning(EDO)is proposed.This method trains multiple policy networks or action-value networks for each agent and allows agents to make decisions based on the outputs of the ensemble of multiple networks,aiming to address the problem of insufficient decision-making robustness caused by the failure of individual policy networks or action-value networks in certain states.This method combines the characteristics of reinforcement learning and designs two ensemble methods: ensemble method based on action confidence weight and ensemble method based on optimal action voting,which effectively integrate the outputs of multiple networks.The key idea of EDO is that when an individual policy or action network fails in a certain state,the agent can still make decisions based on other functioning networks.EDO is widely applicable and can be used in actor-critic methods as well as value function decomposition-based methods.EDO is combined with two representative Multi-Agent Deep Reinforcement Learning models(MADDPG and QMIX),and comparative experiments are conducted in three different multi-agent experimental environments.The results demonstrate that the proposed method enables the multi-agent system to obtain more team rewards,increasing the win rate from23% to 88% and enhancing the robustness of multi-agent decision-making.
Keywords/Search Tags:Multi-Agent Systems, Deep Reinforcement Learning, Q-learning, Bias, Decision
PDF Full Text Request
Related items