Font Size: a A A

Reserach On Multi-player Imperfect Information Game Strategy Based On Ficitious Self-play

Posted on:2019-05-10Degree:MasterType:Thesis
Country:ChinaCandidate:J B MaoFull Text:PDF
GTID:2428330590473919Subject:Computer Science and Technology
Abstract/Summary:PDF Full Text Request
Machine game is a hot and challenging research direction in the field of artificial intelligence,and has been widely concerned by the academic community.In recent years,research on machine games has produced a number of eye-catching research results,such as the AlphaGo,a Go game that beats top Goplayers,and the Libratus.At present,the technology of machine gaming is used in the resolution of many practical problems,such as power dispatching,traffic control,and recommendation systems.According to the completeness of the game information,the game is divided into perfect information game and imperfect information game.Many decision problems in reality can be abstracted into the strategy optimization problem of imperfect information game.However,the strategy optimization algorithm of incomplete information,such as the Libratus,can only solve the two-person,discrete-action,simple-state game problem.Can not be applied well in solving real-world decision problems.Therefore,it is of great theoretical and practical significance to study multi-person incomplete information strategy optimization algorithms that support continuous motion and complex state.Based on the Fictitious self-play,combined with deep learning and multi-agent reinforcement learning,this paper uses Texas Hold'em and multi-agent particle environment as the experimental platform to study the multi-agent incomplete information machine game strategy optimization method.In the traditional method to solve the imperfect information game problem of Texas Hold'em,it is necessary to use the field of card abstraction to reduce the scale of the game tree,and the mobility is poor.This topic introduces the algorithm framework of Fictitious self-play,which divides Texas Poker strategy optimization into two parts: learning of optimal response strategy and learning of average strategy.It is realized by imitation learning and deep reinforcement learning respectively.A general optimal strategy learning method.On the issue of two-person Texas Hold'em strategy optimization,this topic uses the multi-category logistic regression method based on neural network and reservoir sampling to learn the average strategy,and uses the deep Q network to learn the optimal response strategy.The agent can rely on domain knowledge without relying on domain knowledge.Under the premise,the performance is similar to the traditional iterative algorithm;on the multi-player Texas Poker strategy optimization problem,the multiagent actor critic algorithm is introduced to learn the optimal response strategy,so that the value network can observe all the states and thus reduce the valuation.The deviations alleviate the instability of traditional reinforcement learning algorithms in multi-agent environments.At the same time,aiming at the influence of Bad Update on the optimization and optimization of multi-agent strategy in the process of optimization strategy,this paper proposes the optimization of multi-agent near-end strategy based on the idea of near-end strategy optimization.The algorithm can guarantee that each update can monotonously enhance the agent strategy.In the experiment,the algorithm can achieve similar or better performance than other current advanced reinforcement learning algorithms.
Keywords/Search Tags:imperfect information game, fictitious self-play, multi-agent reinforcement learning
PDF Full Text Request
Related items