| Games in the real world,such as war games,business competitions,sports competitions,and card games in daily life,are mostly multiplayer,competing,interactive,and imperfect-information.Mathematically,such games can be modeled as Competitive Extensive-Form Imperfect-Information Games(CEF-IIGs).Since such games exist widely,the algorithms for them have important application value.On the other hand,since such games are very complex,the algorithms are closely related to many fields of artificial intelligence,such as tree search,online optimization,deep learning,etc.Therefore,research on the algorithms can also promote the development of related fields.The complexity of this kind of games is mainly in three aspects:first,since the players are competing in this kind of games,they cannot be solved by conventional optimization methods;second,the nature of extensive-form games determines that the scale of the game and the difficulty of solving the game are exponentially related to the number of interactions;third,imperfect information means that the players cannot obtain global information during the play,so the algorithms for perfect-information games(e.g.,the game of Go)are no longer applicable.Currently,the main methods for solving CEF-IIGs are regret minimization.Among them,Counterfactual Regret Minimization(CFR)is the best in practice,which has been successfully applied to large-scale Texas Hold’em poker.Online Convex Optimization(OCO)is a kind of classical regret minimization algorithms,which has been applied to CEF-IIGs in recent years,and the related theory has been developing rapidly.However,there are still several key problems to be solved urgently in these algorithms:1)The problem of inconsistency between theory and practice.On the one hand,the practical convergence speed of CFRs is significantly faster than the theoretical guarantee;on the other hand,the theoretical advantages of the OCOs fail to transfer into practical advantages.2)Slow convergence problem.Traditional regret minimization algorithms require many iterations to obtain an acceptable approximate solution,which is not satisfactory in practical application.3)Inefficiency problem in largescale games.Traditional methods are computationally and memory-intensive.Besides,they usually rely on model-based sampling methods,which makes them inapplicable to more large-scale games that are difficult to model.Aiming at these problems,this thesis mainly studies Regret Minimization Algorithms for Competitive Extensive-Form Imperfect-Information Games.Specifically,for the inconsistency problem between theory and practice,this thesis studies and establishes the equivalence relationship between CFRs and OCOs,in order to open up a new research direction for improving the theoretical results of CFRs and the practical performance of OCOs;for the slow convergence problem,this thesis studies adaptive optimistic OCOs to improve the theoretical and practical convergence speed of the algorithms at the same time;for the inefficiency problem in large-scale games,this thesis studies an efficient CFR algorithm based on model-free sampling and neural networks.The main work and innovations of this thesis are summarized as follows:For the inconsistency of theory and practice,this thesis establishes and proves the equivalence relationship between CFRs and OCOs,which provides a new theoretical explanation for the advanced performance of CFRs,and at the same time improves the practical performance of OCOs.Specifically,this thesis proves that they are equivalent to CFRs.This equivalence can explain the CFRs as adaptive OCOs,thus indirectly explaining the performance advantage of CFRs in theory.Then,inspired by the adaptivity of CFRs,this thesis proposes new OCOs based on "future strategy" and studies their convergence performance on a variety of CEF-IIGs.The results show that the new future-dependent OCOs are not only significantly faster than traditional OCOs,but even faster than the state-of-the-art CFRs on some CEF-IIGs.For the slow convergence problem,this thesis proposes adaptive improved algorithms with significant theoretical advantages by generalizing existing optimistic OCOs.Then,combined with the previous work,efficient adaptive algorithms based on future strategies are proposed.Specifically,this thesis generalizes existing optimistic OCOs to make them adaptive,and by improving the regularization function,the upper bound of the exploitability is significantly improved(the dependence on game depth is reduced from exponential to polynomial).Then,based on the equivalence of OCOs and CFRs,this thesis proposes efficient optimistic OCOs based on future strategies,and proves that they are equivalent to predictive CFRs under certain conditions,thus providing the latter with a new theoretical explanation.The experimental results show that the adaptive optimistic OCOs proposed in this thesis are significantly faster than similar algorithms,and the future-dependent optimistic OCOs are faster than predictive CFRs on various CEF-IIGs.For the inefficiency problem in large-scale games,this thesis proposes a CFR based on model-free sampling and neural networks,which not only avoids the problems caused by model-based sampling,but also significantly increases the training efficiency.Specifically,this thesis proposes a novel CFR with recursive property and proves its convergence.The algorithm does not need to keep track of the cumulative counterfactual regrets at decision points.Then,a recursive CFR based on model-free sampling and neural networks is proposed.Instead of approximating cumulative counterfactual regrets,the algorithm only needs to approximate non-cumulative recursive substitute values,thus has significantly higher training efficiency.The experimental results in large-scale games show that the new algorithm not only has higher training efficiency,but also its performance is comparable to or even better than the state-of-the-art algorithms. |