Reserach On Imperfect Information Game Strategy Based On Ficitious Self-Play

Posted on:2021-05-13

Degree:Master

Type:Thesis

Country:China

Candidate:S H Hu

Full Text:PDF

GTID:2370330611499747

Subject:Computer technology

Abstract/Summary:

PDF Full Text Request

In recent years,computer games have received extensive attention from academia and industry,and research in the field of machine games has also achieved remarkable results.For example,Deep Mind's Alphago beat professional Go players,and CMU's multiplayer nolimit Texas Hold'em Poker agent named Pluribus beat top player.Besides,Open AI's Open AI Five beat the professional Dota2 team.The related technologies in computer game are also being applied in many practical scenarios,such as intelligent transportation,intelligent recommendation,multi-round dialogue,quantitative transaction,and the like.The computer game can be divided into a perfect information game and a imperfect information game.The major distinction between a perfect information game and an imperfect information game is whether the players are fully aware of the game information.Many decision problems in real-world scenarios can be modeled as strategy solving problems in imperfect information games.However,current computer game algorithms need state abstraction,which is only applied to two-person games and does not perform well in high-dimensional action spaces.Therefore,it is of great significance to study the strategy solving algorithm in imperfect information games that can be applied to complex state space,support continuous action,and is suitable for multiplayer games.Based on the framework of Fictitious self-play,combined with the Monte Carlo tree search,deep reinforcement learning,and multi-agent reinforcement learning to solve the problem of strategy optimization.This paper uses Texas Hold'em Poker and Pommerman as the experimental platform to study the game strategy in two-player and multi-player game problems.In order to solve the problem of state abstraction in complex game,this paper proposes a method based on deep reinforcement learning and adaptive Monte Carlo search tree algorithm to solve the best response strategy.Imitation Learning is used to fit the global average strategy,which leads to a more robust strategy optimization.Aiming at the problem that traditional strategy optimization algorithm performs poorly in continuous action space,the reinforcement learning algorithm based on policy gradient is introduced,which can be applied to high-dimensional action space.Besides,this paper introduces maximum entropy to balance the exploration and utilization of strategy optimization.Aiming at the strategy optimization problem in multi-player game,centralized training and decentralized execution is adopted to strengthen the sharing of global information toreduce the estimation error of value network.In order to solve the credit assignment in multi-player game,a global baseline reward is introduced to measure the agent's action value more accurately.At the same time,the strategy network is pre-trained to alleviate the problem of sparse reward.This method implements the warm start of Fictitious self-play to accelerate the convergence of the strategy.In order to verify the effectiveness of the improved Fictitious self-play proposed in this paper,this article designs the two-player nolimit Texas Hold'em Poker agent according to the rules of ACPC.Similarly,this paper designs the multi-player pommerman agent according to the rules of Neur IPS 2018.In the two-player Texas Hold'em poker experiment,the Fictitious self-play based on the adaptive Monte Carlo tree search and imitation learning is superior to the traditional iterative algorithm.In the multi-player pommerman experiment,the multi-agent Fictitious self-play based on maximum entropy and baseline reward achieved similar performance to other advanced multi-agent reinforcement learning algorithms.

Keywords/Search Tags:

imperfect information games, fictitious self-play, deep reinforcement learning, Monte Carlo Tree Search

PDF Full Text Request

Related items

1	Research On Inplementation Of Artifacial Intelligence Algorithm In Texas Holdem Based On Monte Carlo Tree Search
2	Research And Realization Of Complete Information Game Theory Based On Reinforcement Learning
3	Research And Application Of Imperfect Game Strategy Based On UCT Algorithm And Deep Reinforcement Learning
4	Research On Knowledge Graph Completion Model Combining Temporal Convolutional Network And Monte Carlo Tree Search
5	Research And Application Of Incomplete Information Game Algorithm Based On Reinforcement Learning And Game Tree Search
6	Research And Application Of Imperfect Information Game Decision Based On Knowledge And Game-tree Search
7	SgRNA Activity Prediction Method Based On Reinforcement Learning
8	Research On Differential Games Of Air Combat Based On Reinforcement Learning
9	Modeling CGF Tactical Decision Making Through Monte Carlo Tree Search
10	Research On The Design Of Agent-based Decision Model For Games Based On Reinforcement Learning