| The rapid development of deep learning and reinforcement learning technology has promoted the innovation of related theories and technologies in the field of artificial intelligence.Since the great success of Alpha Go intelligent robots in 2016,more and more researchers have begun to pay attention to algorithm research in the field of machine games.Machine games are one of the most difficult research directions in the field of artificial intelligence.According to whether the agent has complete information,machine games are divided into two categories:incomplete information machine games and complete information machine games.The success of the Go agent represents a historic breakthrough in the complete information game in the field of machine games.There are still many problems that need to be solved in the field of incomplete information game games where the state and action space is larger and more challenging.Incomplete information game refers to the fact that participants’ information is not completely open to each other,and each other does not fully understand the characteristics and information obtained by each other,so it is more difficult than complete information game.In this paper,the typical representative mahjong game in incomplete information game games is taken as the starting point for the research.Aiming at the characteristics of state dimension and huge action space in incomplete information game games,deep reinforcement learning technology is used to explore new methods and new technologies based on A3C models in the application of incomplete information games.The main innovations of this paper are as follows:(1)Aiming at the category features of incomplete information game games,a semantic feature based on category coding is proposed.This coding method is used as the input feature of the model proposed later,and the purpose is to make the model better learn abstract knowledge and better extract features.(2)In view of the advantages of the deep reinforcement learning model,based on the semantic features of category coding,an improved A3C model is proposed,and a decision-making method is designed based on this model to solve the decision-making problem in the game of incomplete information.The improved A3C model improves the "worker" network structure of the original A3C model,and uses the self-play training method in deep reinforcement learning to train the improved A3C model.Through real-time experiments with human players on the Tenhou online competition website,it is verified that the proposed decision-making method has a higher winning rate.(3)In order to further optimize the decision-making method and fully consider the behavior information of the opponent players,a defense model(Defence Model)is proposed to predict the draw behavior of other players.In addition,the decision method based on the improved A3C model and the defense model are combined to propose a joint decision method,which better reflects the "offensive-defense"human-like thinking in the game.Through real-time testing of the joint decision-making method with human players on the Tenhou online sports website,it is verified that the proposed joint decision-making method has a higher winning rate. |