Font Size: a A A

Research On Counterfactual Regret Minimization Based On Deep Learning And Regret Discount

Posted on:2022-09-26Degree:MasterType:Thesis
Country:ChinaCandidate:X Z SunFull Text:PDF
GTID:2518306569497634Subject:Computer technology
Abstract/Summary:PDF Full Text Request
In real life,games can be found everywhere.As a product of the combination of game theory and computer technology,machine game is one of the important research directions in the field of artificial intelligence.Machine game is divided into perfect information game and imperfect information game.All players in perfect information game know each other's exact state,while players in imperfect information game can not get all information,so they need to think more carefully about their own strategies.Compared with perfect information game,imperfect information game is more close to the real world,such as business negotiation,military game and financial regulation and control.These real-life situations contain hidden information.Therefore,it is of great practical significance to study imperfect information game.In 2008,Counterfactual Regret Minimization(CFR)be proposed which is one of the most effective methods to solve imperfect information game.Although the CFR algorithm has been able to solve large-scale game problems,the traditional CFR algorithm has the problem of slow convergence,and because it is an offline self-training algorithm,it can not accurately solve the new situation in the actual game.Aiming at the above two problems,this thesis studies how to accelerate the convergence speed of CFR algorithm and how to combine deep learning estimation.In view of the slow convergence of traditional CFR,we consider how to speed up the convergence of CFR from the perspective of changing the regret value iteration method.The traditional CFR algorithm searches the entire game tree in each iteration,which is inefficient and time-consuming,which limits its application in large-scale game problems.In the previous variant of the CFR algorithm,the contribution of each iteration to the accumulated regret value was assigned the same weight.Now we consider discounting each iteration of CFR when determining the regret value.In more detail,it is to assign a smaller weight to the early iterations in order to decrease the influence of early iterations for the final strategy.Aiming at the problem that the off-line computing strategy of CFR algorithm is not suitable for the unknown situation of game environment,the deep neural network is used to approximate the behavior of CFR algorithm in the whole game process.Usually,when solving large-scale extended game problems,the CFR algorithm needs to abstract the original problem before running the CFR algorithm,which may lead to information loss and reduce the accuracy of the strategy.We combine the neural network estimation method to avoid the abstraction of the original problem,so as to further improve the efficiency and quality of strategy solving.In order to verify the effects of different CFR algorithms,this paper conducts experimental tests on different incomplete information card games,and verifies the effectiveness of the proposed algorithm through relevant experimental analysis.
Keywords/Search Tags:imperfect information game, counterfactual regret minimization, regret discount, value network
PDF Full Text Request
Related items