With the development of computer technology,the development and design of combat game platforms has gradually become the research focus of major institutions.However,there is less research work on agent modeling on combat platforms,especially the application of deep reinforcement learning technology to mixed units.The accusation is a major difficulty in the modeling of combat agents.This thesis proposes a modeling method of mixed-units command and control agent based on deep reinforcement learning.The specific design and research work includes:First of all,for the problem of homogeneous-units accusation,a deep reinforcement learning agent based homogeneous units is proposed and designed,which solves the problem of complex action commands through a hierarchical action space scheme,and reconstructs the reward function by adding additional rewards such as distance rewards.Through simulation experiments,compares the effectiveness of single-moment and multi-moment representation methods for state representation.Secondly,in response to the problem of heterogeneous-units accusation,a heterogeneous arms accusation agent based on deep reinforcement learning is proposed and designed,which aligns the granularity of action commands by means of knowledge rules and reinforcement learning hierarchical action spaces,and solves the interaction through independent training.Inconsistent frequency problem,and combined with a variety of multi-agent reinforcement learning algorithms to complete the task of heterogeneous arms command and control.Finally,based on the above two accusation problems,a mixed-units accusation agent based on deep reinforcement learning is designed and implemented,and two formation schemes are proposed:first homogeneous division and then heterogeneous division,and first heterogeneous division and then homogeneous division,so as to solve the problem of a large number of combat units and a large action space.In the state space design scheme,the multi-time state design method is used to improve the effectiveness of state representation,and the reward function is reconstructed to improve the training efficiency for dense rewards.The experimental results show that the combat agent designed in this thesis can complete the combat deduction with good effect in the combat platform. |