Font Size: a A A

Research On Optimization Of Command Policy Based On Reinforcement Learning

Posted on:2022-04-10Degree:MasterType:Thesis
Country:ChinaCandidate:J M YangFull Text:PDF
GTID:2518306335451884Subject:Control theory and control engineering
Abstract/Summary:PDF Full Text Request
In the field of artificial intelligence,the research of reinforcement learning has constantly produced results that shock the world.For example,the Alpha Go and Alpha Zero Go artificial intelligence robots developed by Deep Mind defeated the world champion of Go,Open AI defeated top professional players in Dota2,and reinforcement learning in Atari Excellent performance in the game and so on.In reinforcement learning,some traditional algorithms can learn excellent decisions in most simple environments and get good convergence.However,when the task environment becomes complex and changeable and the optimal strategy needs to be obtained in real time,these traditional reinforcement learning algorithms Is limited.And many real-world problems cannot be completed by a single agent,so more and more researchers are turning their attention to the environment and algorithms of multi-agents.As a simulated war strategy game,Star Craft has multiple complex factors such as incomplete information game,complex and huge action operation space,high real-time requirements,and multi-agent cooperation,which are difficult for the learning of overall command strategy.When the environment becomes more complex and the difficulty level of the tasks to be completed becomes higher and higher,it is difficult to obtain correct and effective learning data and positive reward feedback,and it is also difficult for traditional reinforcement learning algorithm models to learn these data effectively.As a result,it is impossible to obtain a well-performing algorithm model.In this paper,the main research is to improve the efficient acquisition and learning of empirical sample data in a complex environment to improve the decision-making ability of the algorithm model.In this paper,Star Craft II is selected as the simulation experiment platform for algorithm research in the complex environment of multi-agents,and the following three main work contents are mainly carried out:1.In the multi-agent collaborative task,the parameter experience sharing mechanism between agents is added,so that the parameter optimization situation and experience data between multiple agents are interoperable,so that each agent can learn everything agent's data.2.For multiple tasks of the same category,add a strategy sharing mechanism,so that each task can learn not only its own effective strategy but also excellent strategies for other tasks,and each task is not zero for the common part between tasks.Basic learning to improve the learning efficiency of certain common strategies.3.Combining the above two points to realize the information sharing between multiple agents with different tasks can not only improve the information communication between agents,but also expand the learnable data samples exponentially to achieve a more thorough explore in task environments.The task is divided into levels of difficulty and easy,and the incremental training method of curriculum transfer is used to verify the overall model from easy to difficult.In this paper,7 mini game environments are selected to verify the above research content.First,the performance of 2 separate sharing mechanisms is verified,and the effectiveness of their work alone is proved,and then the integrated architecture is verified.From the results of multiple experiments,the information sharing mechanism has greatly improved the reward and feedback of the task,and the scores of the game rounds have also been improved to varying degrees.
Keywords/Search Tags:Reinforcement Learning, Parameter Experience Sharing, Policy Sharing, Command and Decision-making, Star Craft?
PDF Full Text Request
Related items