Font Size: a A A

Research On Game Algorithm Based On Fictitious Self-play With Prioritized Experience Replay

Posted on:2022-03-20Degree:MasterType:Thesis
Country:ChinaCandidate:D D ZhangFull Text:PDF
GTID:2518306569994549Subject:Computer Science and Technology
Abstract/Summary:PDF Full Text Request
As an important strategy for the country's macro development,Artificial Intelligence(AI)has a critical influence on whether our country can play a leading role in the international competition.Computer game is one of the important content of Artificial Intelligence,reflecting the development level of AI.Computer game includes perfect information game and imperfect information game.Imperfect information game is more complicate and has become the research hotspot because of its important applications in many aspects of real life such as economy and military.Texas Hold'em is a representative research object of imperfect information games.It contains uncertainty,incompleteness and unreliability of information,no-limit strategy and multiple Nash Equilibriums caused by multiple players.Currently,the research on imperfect information games in domestic and foreign mainly has three directions: the research based on game theory,machine learning,and the combination of game theory and machine learning.The most representative algorithm in the third direction is Neural Fictitious Self-Play(NFSP).This algorithm combines the advantages of algorithms in the first two directions,that is,the guarantee of Nash Equilibrium theory and the ability to learn from scratch without prior knowledge.However,NFSP still has two problems.One is the high cost of interaction with the environment,the other is its application limitation.Focusing on the high interaction cost of NFSP,this dissertation proposes and implements the mechanism of prioritized experience replay,and combines this mechanism to propose two improved NFSP game algorithms,NFSP-PER and NFSP-PER-LT.Analyze the way of experience learning to confirm its effect on learning efficiency and the effect of learning efficiency on the interaction cost.By optimizing the way of experience learning to improve the learning efficiency and reduce the interaction cost,the agent can obtain higher game level under the same training condition than the original.Start from the experience learning sequence and the learning degree to optimize the way of experience learning.The valuable experience is first sampled and learned,and learned more deeply.The algorithms proposed in this dissertation are evaluated on the typical two-player zero-sum game platforms,and win the games with the current NFSP algorithm and the other two algorithms,showing higher intelligence.Aiming at the application limitation of NFSP,this dissertation implies the proposed algorithms to large-scale multi-player games and implements algorithms suitable for 3-Player and 6-Player Ring No-Limit Texas Hold'em.The multi-player game is transformed into the interaction between a single agent and the environment by adopting Markov decision process to model the extensive form games and the interaction can been seen as the game between the single agent and the environment.The strategy of the multi-player game can be obtained by solving the Markov decision process decided by the strategy profile of the opponent agents.The proposed algorithms defeat the current NFSP algorithm and random strateg y algorithm in multi-player no-limit Texas Hold'em.
Keywords/Search Tags:computer game, imperfect information, fictitious self-play, experience replay
PDF Full Text Request
Related items