Artificial intelligence technology has evolved rapidly in recent years.As a popular research field,machine game has been widely concerned by researchers.In recent years,methods of machine game represented by the deep reinforcement learning algorithm has been greatly developed.With the success of Alpha Go agent,a major breakthrough has been made in the field of perfect information machine game.The imperfect information machine game,has become a new research focus in the field of artificial intelligence because of its high complexity and incomplete information perception.This dissertation mainly studies solving problem of 3D video games under imperfect information conditions.Aiming at the problem of high-dimensional state space and incomplete information perception in 3D video machine game,this dissertation proposes a kind of deep reinforcement learning method based on intrinsic reward based policy optimization algorithm.First,we adopt the method that restricting updating range of action policy ratio,to alleviate the high variance and instability problems caused by the traditional deep reinforcement learning algorithm.Second,for the common problem that agent lacking environmental feedback reward in these 3D scenes,the intrinsic reward model is proposed.By designing target mapping network and prediction network,we propose a new intrinsic reward mechanism to make up for the lack of environmental feedback rewards by generating intrinsic reward,which is helpful for updating agent's action policy.At last,considering the different structure between the intrinsic reward controling model and the traditional policy optimazation algorithm,we adjust the structure of value network,and then combine these two modules.Based on these modifications,we propose intrinsic based policy optimazation algorithm,which improves the effectiveness of deep reinforcement learning algorithms in the sparse reward 3D scenes.From the perspective of enhancing the agent's perception of environmental reward information and estimation accuracy of reinforcement learning state information,this dissertation proposes and designs three types of auxiliary learning tasks based on the auxiliary task mechanism in multi-task learning.By using Experience Replay buffer,we sample agent's interacting datas,and use it as the training data for the auxiliary tasks,effectively combine deep reinforcement learning and auxiliary task learning.On this basis,the reward enhancement method based on the auxiliary task learning mechanismis combined with the intrinsic reward policy optimization algorithm to further improve the performance of agent trained by original reinforcement learning algorithm in the 3D scene.In this paper,the 3D video game Vizdoom is used as the testing platform of deep reinforcement learning algorithm.The effectiveness of proposed algorithms is verified by relevant experimental analysis. |