Font Size: a A A

Research On Multi-agent Reinforcement Learning Based On Dynamic Optimistic Estimation

Posted on:2024-04-06Degree:MasterType:Thesis
Country:ChinaCandidate:R X LiuFull Text:PDF
GTID:2568306926975249Subject:Computer technology
Abstract/Summary:
Multi-agent systems consist of multiple agents that optimize policies through dynamic interactions with the environment to accomplish complex global tasks.Existing studies have shown that reinforcement learning can effectively enhance the learning and decision-making abilities of each agent.However,multiagent reinforcement learning still faces the following problems:on the one hand,after the dynamic environment changes,the strategy of the multi-agent cannot be updated in time;on the other hand,in the experience pool samples,the difference in importance of different samples is easily ignored by the multiagent.Aiming at the above two problems,this thesis uses the SMAC StarCraft Ⅱ multi-agent experimental environment,a game simulation system,to propose the following methods:(1)Aiming at the dynamic mismatch in the complex environment system,a multi-agent cooperative confrontation algorithm based on dynamic optimistic estimation is proposed.The algorithm incorporates a dynamic optimistic estimation module into the actor-commentator system.The dynamic optimistic estimation module judges whether the current situation is optimistic based on the historical rewards of the environment,and builds and trains an optimistic parameter judgment model.The actor-critic network with dynamic optimistic estimation module can adjust the strategy in time to achieve the best trade-off between exploration and exploitation.The experimental results show that in the SMAC StarCraft Ⅱ environment,the stability and convergence speed of the algorithm on the 5mvs6m map are better than the current mainstream classic algorithm.(2)Aiming at the importance difference problem of multi-agent experience pool samples,a dynamic optimistic estimation multi-agent reinforcement learning algorithm based on priority experience replay is proposed.By comparing the approximation between states and the approximation between state transition tuples,the algorithm selects better state transition tuples according to the current state and stores them in the experience replay pool,thereby improving the sample diversity of state transition tuples in the replay pool and sample quality.The experimental results show that the method has good stability and winning rate in the SMAC environment.In summary,by proposing a multi-agent cooperative confrontation algorithm based on dynamic optimistic estimation and a dynamic optimistic estimation multi-agent reinforcement learning algorithm based on priority experience replay,this thesis solves the dynamic mismatch and sample importance faced by multi-agent reinforcement learning the issue of gender differences.The experimental results show that the improved algorithms in this thesis have good performance,which provides a useful reference for the subsequent research and application of multi-agent reinforcement learning.
Keywords/Search Tags:Reinforcement learning, Multi-agent systems, Dynamic optimistic estimate, Priority experience replay
Related items