Font Size: a A A

Research And Realization Of Complete Information Game Theory Based On Reinforcement Learning

Posted on:2022-03-27Degree:MasterType:Thesis
Country:ChinaCandidate:Z Y ZhangFull Text:PDF
GTID:2480306605471804Subject:Master of Engineering
Abstract/Summary:PDF Full Text Request
As a product of the combination of modern computer technology and Game theory,computer games have become one of the most popular research directions in the field of artificial intelligence.Complete information game has become the main research direction of computer game because of its wide applicability.In recent years,with the development of artificial intelligence technology,deep learning,reinforcement learning and other algorithms have been widely used to solve all kinds of complete information game problems.In particular,the combination of neural network training and reinforcement learning algorithms has evolved various artificial intelligence algorithms for game games,such as Go,which is one of the representatives of the complete information game.A series of Alpha Go Go versions developed by the Deep Mind team have successively defeated the world's top players such as Lee Sedol and Ke Jie,pushing AI to a new stage of development.The results of complete information games such as Go can also be applied to other fields,such as the deduction of chess,intelligent decision-making systems,financial decision-making,and robot control.The Gomoku Game,as another typical representative of a complete information game,has become the second most popular chess game in the world only to chess because of its simple and easy-to-understand rules.It has the typical significance of agglomeration game,at the same time it is easy to carry out research and has the advantages of the intelligence level and speed of rapid response model training.Therefore,based on the theory of reinforcement learning and complete information game,we use Gomoku Game as a model to conduct indepth research on the evolutionary speed of its training and the diversity of the game record.The main research contents and innovations are as follows:(1)We combine the algorithm in deep learning and the Monte Carlo tree search based on ordinary computer hardware level conditions,which is in reinforcement,learning to train the Gomoku model.And optimize for the shortcomings of existing methods,for example,we set a little value as the initial value for the probability distribution of the unplaced position which is calculated by the policy network,and use different chess strategies according to the number of steps in a game.We analyze the changes in the loss value of the network trained by this method,the changes in the level of the model,and the evolution of the movement strategy of the game record to verify the feasibility of the optimization method.We finally get a Gomoku AI model that is stronger than ordinary people.(2)Aiming at the low efficiency of Monte Carlo tree search in multithreading,we propose a Monte Carlo tree search method based on the mechanism of falling leaves.That is,from the beginning of a certain generation to the end of the self-game of the generation,we let each thread inherit only one search tree,and drops nodes with lower value information at an appropriate time according to the total number of nodes in the tree,thereby increasing the search depth and width of the search tree.Through experimental analysis,we know that the algorithm can effectively improve the evolution speed of the network without increasing the training time,and can make the Gomoku AI model has a self-learning effect in the process of self-playing,enriching the diversity of the game record and reducing the occurrence of the over-fitting phenomenon.(3)We propose a dual-chain parallel neural network structure,aiming at the bottleneck problem of Gomoku AI model.The shared network part of the original network is changed from a single-chain serial to a double-chain parallel structure.We know through experiments that the dual-chain model has a faster evolution speed,and adding dual-chain can effectively reduce the phenomenon of over-fitting when the model's chess level is in a bottleneck and enhance the evolutionary ability of the model.
Keywords/Search Tags:Complete information game, Reinforcement learning, Gomoku game, Monte Carlo Tree Search, Falling leaves mechanism, A dual-chain parallel
PDF Full Text Request
Related items