Font Size: a A A

Research On Gomoku Algorithm Based On Deep Reinforcement Learning

Posted on:2020-05-31Degree:MasterType:Thesis
Country:ChinaCandidate:Q WangFull Text:PDF
GTID:2428330623962069Subject:engineering
Abstract/Summary:PDF Full Text Request
A long-term goal of artificial intelligence is to start at “zero” and reach levels beyond humanity in the most challenging areas.Starting from “zero” means that no human knowledge is needed,and self-learning can continue to improve.This is a general-purpose artificial intelligence idea,not just a specific area of adaptation,it provides us with the possibility to solve a wider range of practical problems.Alphazero has achieved success in Go,Chess,and Japanese Chess.This is an amazing achievement and a masterpiece of general artificial intelligence.The main innovation is the combination of deep reinforcement learning and Monte Carlo tree search.This combination is instructive and provides a space that can be explored extensively.This article will make a positive attempt in this regard.Gomoku is a board game with simple rules but not easy to master.Under the same size board,its complexity is close to Go.At present,the strongest Gobang program is a jealousy.There is no absolute advantage in facing the top players of human beings.The recent confrontation is the mutual victory and defeat of both sides.This proves that Gomoku is a chess game that seems to be simple and difficult.This paper is based on the deep reinforcement learning theory,and draws on the Alphazero algorithm to make an active exploration of the Gomoku AI.In the description method of the chess state and the structure of the strategy value network,we propose our own new unique design,in order to obtain better training speed and convergence precision of the strategic value network,and finally achieve strong performance in chess.On the one hand,in the description of the chess state,this paper proposes two new designs.First,based on the basic chess-shaped description,multiple feature planes describing historical walks are introduced,which is called N-step history.According to the Markov decision process,these newly introduced feature planes appear to be redundant,but experiments have shown that it has a significant impact on the convergence speed of the strategy value network.Secondly,according to the chess characteristics of Gomoku,the description method of regional value subdivision is proposed.The experiment also proves that it improves the convergence accuracy of the network,and the chess performance is stronger.On the other hand,in the structure of the strategic value network,this paper designs an inception module to enhance the chess shape perception,so as to improve the network's comprehensive perception of the global chess shape and the local chess shape.The whole network is called “composite visual field network”.Experiments show that the composite visual field network has better convergence precision and stronger chess performance.
Keywords/Search Tags:Deep Reinforcement Learning, Monte Carlo Tree Search, Gomoku
PDF Full Text Request
Related items