| In today’s Artificial Intelligence era,society is developing rapidly and major breakthroughs have been made in many fields.Since the birth of artificial intelligence,computer game intelligence has been widely regarded as one of the most popular and challenging research directions in Artificial Intelligence,and has been widely concerned by the academic community.In recent years,the research on computer game intelligence has gradually developed from relatively simple environments,such as two-player zero-sum games under perfect information,to multiplayer games under imperfect information in complex environments.Most imperfect information multiplayer games are full of difficulties and challenges in the study of AI agents due to their complex game rules and massive hidden information.Reinforcement Learning(RL)is currently a popular computer agent model training approach.By combining it with Deep Learning(DL)methods,it can learn the optimal strategy by continuous trial and error and maximizing cumulative rewards while learning features representation with deep convolutional network.The Deep Reinforcement Learning(DRL)algorithm greatly improves the performance of the AI agent while enabling the AI agent to train itself heuristically.However,in the actual scene application,in the face of the imperfect information game scene,the traditional DRL method will have adaptability and performance problems,such as difficulty in adapting to complex environmental rules,poor performance of the agent,and slow model training process,The hardware resources required for training are too high,etc.,which ultimately hinder the training and learning process of the agent.This paper introduces the core research content of multiplayer Monte Carlo Tree Search and RL,and realizes the learning algorithm of computer game agent in multiplayer game with imperfect information based on Deep Learning method.Through in-depth research on DRL algorithms,an efficient and highly extensible learning algorithm is proposed without sacrificing performance,and the application research of computer game intelligent algorithms is carried out.The main research content of the paper is as follows:(1)Raising questions and analyze,introduce the current research status of computer games,analyze the problems faced in the reality of multiplayer games with imperfect information,then analyze the relevant theories and algorithms,and finally propose the main research content of this paper.(2)Using the MCTS algorithm to learn the Mahjong multiplayer game strategy under perfect information.During the learning process,there is a problem that the environment state transition is complex and difficult to learn.To solve this problem,an implicit MDP is introduced to parameterize the state transition in the game tree.Simplify and realize the Mahjong multiplayer game tree,and finally use the transfer learning method to realize the final Mahjong game agent.During the learning process,the network model basically converges within 800,000 steps,and the correct rate of test samples is above 78%.Through the confrontation evaluation of the game agent implemented by the algorithm in this paper on the Tenhou platform,the results show that the agent model can reach the performance level of about five stages,and has a relatively good performance.(3)An improved learning method based on policy gradient is proposed.By combining weighted importance sampling and entering an additional reward distribution model to improve the learning signal,the utilization rate of trajectory data and the learning efficiency of the agent model are improved,and it is successfully applied to Mahjong game scene.Experiments prove that the improved asynchronous policy gradient reinforcement learning method has significantly improved data utilization,learning rate,and convergence performance.Experiments show that after introducing improved technologies such as weighted importance sampling,the correct rate of the asynchronous policy gradient algorithm training model is increased by about 3.38%compared with the model performance test samples in Chapter 3.In the process of confrontation with the opponent model,the performance level of the agent is improved.About 12.96%,and successfully reached the sixth stage,compared with the fifth stage of the agent in Chapter 3,the performance level of one stage has been improved.(4)Building an algorithm running platform,build a local environment,and build a Mahjong game agent.According to the Tenhou platform API interface and the game platform network structure and system architecture of the computer game tournament,design the agent client and realize the Mahjong agent. |