| With the rapid development of national defense and military information construction,new intelligent equipment is constantly pouring into the operational command system,which has profoundly changed the form and mode of war.On the one hand,the operational command network elements are increasing,and the boundaries of the information battlefield are expanding,which have far exceeded the scope of human cognitive ability,so it is urgent to introduce intelligent decision-making technology to assist operational command.On the other hand,new unmanned autonomous equipment is deployed in the army,and new technologies and tactics are urgently needed.Because of its characteristics of real-time and verisimilitude,computer wargame has become an important means of war confrontation simulation,which can be directly used for the verification of operational plans and the research of new tactics,and its achievements can be applied to the field of operational command assistant decision-making and autonomous control of unmanned equipment.Compared with the traditional experimental environment,agent strategy generation in wargame environment faces difficulties and challenges such as complex state and action space,partial observability,uncertain opponent strategy,highly random environment,multi-agent confrontation and so on.In order to solve the problems of fixed agent behavior pattern and difficult optimization of reinforcement learning strategy in wargame environment,this paper studies the problems of entity behavior modeling,target point decision-making and path planning in wargame based on knowledge modeling,game theory,imitation learning,reinforcement learning and other technologies,and explores the knowledge-driven and data-driven method of confrontation strategy generation.The main work is as follows:1.In view of the complexity of decision-making in the wargame environment,this paper takes the tactical wargame as an example,designs the agent framework,and proposes the methods of multi-piece cooperation and single piece action execution.Based on the hierarchical action architecture,the entity decision-making behavior is modeled,the finite state machine is used to realize the switching of high-level tasks,and the behavior tree is used to control the actions in the execution of a single piece task.Through the analysis,the research content focuses on piece maneuver decision,which is divided into two key problems: target point selection and maneuver path planning.The knowledge-driven and data-driven agent strategy solving idea is proposed,which is integrated into the agent framework design to provide support for the study of adversarial strategy generation.2.In order to solve the problem of traditional agent’s fixed behavior pattern and cold start of reinforcement learning in wargame environment,this paper proposes a hybrid strategy construction method based on prior gain,which provides an initial strategy for learning to solve the problem of cold start of reinforcement learning.Through the analysis of the target point decision-making problem in the wargame,the strategic game model of the target point decision-making problem in the wargame is constructed.Based on the prior knowledge,the payoff matrix of candidate targets is constructed,and the hybrid strategy equilibrium is obtained by solving the Pareto front and linear programming problem.Experiments show that the agent behavior pattern based on hybrid strategy is more reasonable and diverse,and has a higher winning rate in computer-computer gaming and human-computer gaming.3.Against the problem that the construction and solution process of strategic game model is easily influenced by human subjective experience,a hybrid strategy optimization method based on game learning is proposed.The neural network is used to represent the selection strategy of target point,and the representation methods of environment characteristics,enemy and friend situation and countermeasure task are designed.In order to solve the problem of cold start in reinforcement learning,the imitation learning method is used to clone the behavior of the target point selection strategy based on the strategic game model.Regularized Nash dynamics and reinforcement learning are combined to further optimize the strategy,which breaks through the limitations of traditional knowledge-based agent strategy.Experiments show that the proposed method can learn and optimize the confrontation strategy obtained from the strategic game model,and further improve the winning rate of gaming.4.In view of the highly dynamic and highly stochastic wargame environment,a dynamic maneuver path planning algorithm based on value function iteration is proposed under the condition that the opponent’s strategy is known,and a maneuver path reinforcement learning method based on reward expectation is proposed under the condition that the opponent’s strategy is unknown,and the derivation proves the effectiveness of the method.In this paper,an experiment is designed based on the path planning problem to reach the target point in the process of tactical confrontation.By using different reinforcement learning algorithms and reward functions for comparative testing,the experiment proves that the reinforcement learning algorithm based on reward expectation can explore the optimal maneuver path and improve the convergence speed of the algorithm in a highly random environment. |