| Game problems have long been the focus of artificial intelligence researchers.With the rapid development of AI technology,many complete information game problems(e.g.,Go,chess,etc.)have achieved excellent results through game tree search techniques.And now,incomplete information game problems have also become the object of attention in game research.Unlike complete information games,the participants of incomplete information games cannot observe all the state information,and it is difficult to proceed with a simple game tree with a lot of uncertainties in the process of expansion.Thus the problem of incomplete information games cannot be well solved by search techniques alone,and the participants need to make effective predictions about the hidden information of the opponents.Therefore,opponent modeling becomes an important method to solve the unobservable information problem of incomplete information games.The role of opponent modeling is to predict the unknowable information from the observable opponent information,and to combine the prediction of unknown information by opponent modeling with techniques such as incomplete information game tree search to arrive at the decision that yields the greatest benefit to oneself.In this paper,we describe a network model for mahjong opponent information prediction using deep learning techniques and supervised learning theory based on the mahjong game problem,and use it in two important opponent hidden information prediction tasks,namely,dangerous tile(tiles that can be “eaten” and “touched” by the opponent)prediction and opponent hand distribution estimation.The specific research work is as follows.1.An efficient state encode method is designed to completely describe the current and historical state information.Feature extraction is performed on the match information of the whole game process to form tile information features,opponent action features and other features,which are encoded into a multi-channel tensor and used as the input of the network model.This state space encode method without relying on mahjong domain knowledge for information compression is simple and easy to operate,completely eliminates the need for the tedious feature combinations in traditional feature engineering,and is well suited for convolutional neural networks for representation learning.2.A network architecture of CNN+LSTM and attention mechanism is proposed for opponent modeling of mahjong games.The game data are encoded with efficient features,the spatial representation capability of CNN is used to learn the semantic relationships of tile information,and the LSTM structure combined with the attention mechanism is used to learn the intrinsic connection between the historical actions of the opponents and the hidden information.Then,we construct a dangerous card(cards that can be “eaten” and “touched” by the opponent)prediction model and an opponent hand distribution estimation model.The dangerous hand prediction task is understood as a multi-label classification problem,and the Focal Loss function is used to solve the label distribution imbalance problem,and the F1 and AUC values are used to evaluate the prediction effect.The goal of the opponent hand distribution estimation task is to obtain the number share of each tile of mahjong in the opponent’s hand(the result is a34-dimensional vector with sum 1).The domain knowledge of mahjong is used to add prior features to enhance the prediction performance,and the MSE and KL scatter are used as the loss functions of the model for effect comparison.Finally,the comparison experiments show that the prediction effect of the proposed network model in this paper is higher than other opponent information prediction models on the same test set.3.The results of these two opponent prediction tasks are integrated into the game tree to assist the mahjong AI program in making tile decisions.Experiments show that the game tree decision procedure integrating the opponent model proposed in this paper outperforms both the decision procedure without the opponent model and the decision procedure with other opponent models in terms of win rate and feed rate. |