Font Size: a A A

Research On Incomplete Information Game Based On Improved Proximal Policy Optimization Algorithm

Posted on:2022-03-05Degree:MasterType:Thesis
Country:ChinaCandidate:Z K HeFull Text:PDF
GTID:2518306569997579Subject:Computer technology
Abstract/Summary:PDF Full Text Request
Machine game is an important research direction in the field of artificial intelligence and a touchstone for testing the development level of artificial intelligence.Deep reinforcement learning provides new ideas for solving the optimal strategy for game scenarios in super-large state spaces.The trained agents have achieved results that surpass human levels in complex game scenarios such as Go,Star Craft,and DOTA2.How to apply reinforcement learning in incomplete information games is a key and difficult problem in the field of machine games.On the one hand,because the environment of incomplete information games is more complex and the model convergence is difficult,a large number of learning samples are needed to support the training of the agent,and the algorithm is urgently needed.Sampling utilization.On the other hand,there is a problem of exploration-exploitation balance in the agent training process,and the agent is easy to fall into the local optimal.A better exploration-exploitation mechanism is needed to help the agent escape the local optimal strategy.Aiming at the problem of low sampling utilization in an incomplete information game environment,a model-based proximal policy optimization algorithm(MB-PPO)is proposed.According to whether an environment model can be accessed or learned,deep reinforcement learning can be divided into model-free deep reinforcement learning and model-based deep reinforcement learning.MB-PPO integrates the model-based agent imagination enhancement algorithm and the model-free proximal policy optimization algorithm.Give full play to the advantages of high sampling utilization of model-based deep reinforcement learning algorithms,improve algorithm sampling utilization and retain the advantages of good training effects of traditional proximal policy optimization algorithms and low computing power requirements,and overall improve algorithm performance.Secondly,aiming at the problem of easily falling into local optima under the environment of incomplete information game,this dissertation proposes a proximal policy optimization method based on space optimization.First,from the perspective of action space optimization,using decision tree methods for reference,combined with game confrontation ideas,a adversarial promotion framework is proposed.The framework uses multiple policy to process different action decisions,and uses reward reshaping to make policy confront each other,thereby improving the level of decision-making.Secondly,from the perspective of state space optimization,using the hierarchical learning idea,a hierarchical goal proximal policy optimization algorithm algorithm is proposed.The upper-level policy in the algorithm provides goals for the lower-level policy.The goals are used as the lower-level policy to explore the internal motivation of the real environment.The upper-level policy make judgments on the macro situation,and the lower-level policy interact with the real environment to process microscopic action decisions,thereby enhancing the agent overall performance.Finally,this dissertation verifies that MB-PPO algorithm is better than PPO algorithm in pacman and pommerman environments,and verifies that the adversarial promotion framework and hierarchical goal proximal policy optimization algorithm have a positive role in solving the problem of exploration-exploitation balance and improving agent performance on pommerman experimental platform.
Keywords/Search Tags:PPO, reinforcement learning, sampling utilization, game
PDF Full Text Request
Related items