| Setting a high level of agent players to play games with human players in computer games is also the essence of the game,which also makes the decision-making method of agent behavior a key research direction.However,with the complex evolution from traditional game to asymmetric game,the poor generalization and strong artificial intervention of traditional methods result in a very low level of decision-making.Using deep reinforcement learning to make decisions independently enables the agent to have a strong decision-making ability by training the neural network under the combined action state information,reward functions and algorithms.In order to enable agents to make decisions independently in asymmetric games,this paper constructs a new asymmetric game environment,and proposes two agent decision-making methods,and finally enables both agents to achieve confrontation and cooperation in asymmetric game environment.The main work is as follows:1.By analyzing the game relationship and the settings of similar games,this paper combines the game models of several policemen catching thieves in life,designs and establishes a new asymmetric game rules and environment independently.To make the decision-making more difficult for an agent,a three-dimensional game scenario with sparse rewards is established,and a variety of elements including dynamic elements are set up in the scene,dynamic disturbance is added to the game,and the difficulty of exploration is increased.At the same time,a custom game framework is designed for the game.The framework allows single agent and multi-agent to play games.2.This paper presents an RND-PPO agent decision method combining LSTM network for thieves agent decision-making.The method uses a reward function that triggers the combination of rewards and continuous rewards to inform the agent of the rules and increase training efficiency.Establish a rule-compliant action and state space to enable an agent to observe and act.At the same time,this paper improves the RNDPPO algorithm.Historical information from the experience pool is entered into the long-short term memory network to predict states,extrinsic rewards.The value network and policy network are trained by combined advantage function which combine intrinsic and extrinsic rewards and predict extrinsic rewards,so that this can enhance the guidance of extrinsic reward function to cope with the loss of policy caused by using intrinsic rewards.This paper presents a multi-agent decision-making method based on the multi-agent deep deterministic policy gradient(MADDPG)of multi-agent for police agent behavior decision-making.The method uses a team-shared reward function,combined with a centralized learning and decentralized execution framework of MADDPG,to enable multiple agents to unify task objectives and have the ability to make cooperative decisions.3.The comparative experiments built in this paper verify the effectiveness and superiority of the algorithm in thieves agent decision-making.Combining with the custom game framework,this paper builds an asymmetric game experiment,and verifies that both agents combining the two methods in this paper have the ability to make complex behavior and decision,and have achieved their goals well in the game. |