Reinforcement learning is a basic branch of machine learning.It is a trial and error process where an agent is controlled to interact with the environment.In this process,it aims to maximize the cumulative reward to learn strategies.In recent years,with the continuous development and wide application of deep learning technology,deep reinforcement learning,the combination of reinforcement learning and deep learning,has made many great breakthroughs in the fields of intelligent decisionmaking,recommendation system,robot control,automatic driving and so on.However,deep reinforcement learning also encountered many problems in the development process.Sparse reward is a major challenge for this technology.Compared with traditional reinforcement learning,the application scenario of deep reinforcement learning is complex and huge amount of data need to be processed,such as virtual shooting scene.This background makes the gap between the reward obtained by achieving the goal and the space to be explored in the process of algorithm training too wide,which eventually leads to the problem of sparse reward.Facing the virtual shooting scene in game,aiming at the two reasons that the main reward of sparse reward is too scarce and the exploration space is too large,this paper has carried out a series of research and optimization,and finally achieved the corresponding innovative algorithm to effectively solve the problem of sparse reward in reinforcement learning.The main contents of this paper are as follows:1.An improved method of deep reinforcement learning based on reward trajectory is proposed and implemented.This method first uses the optical flow method to predict the target trajectory,then defines the spatial reward trajectory according to the spatial position and motion trajectory of all targets in the scene,automatically generates the spatial reward points,and guides the algorithm to move in the right direction.As the basic network structure,the DQN algorithm is optimized to enhance the algorithm performance,improve the impact of the main reward on the initial state.Compared with the original algorithm,the improved algorithm increased the task completion rate by 30.56%and the accuracy rate by 19.44%.To some extent,it solves the problem of sparse reward caused by too few main rewards.2.An improved method of deep reinforcement learning based on RGB-D image is proposed and implemented.Firstly,based on the twodimensional plane color image,the multi-dimensional information such as depth and edge corresponding to the color image is added for feature extraction and fusion,and then the preprocessed image is input into the decision neural network.Based on the DPPO algorithm,the decision neural network is improved by adding 3D convolution and LSTM structure,which greatly improves the exploration efficiency and learning ability of the algorithm,so that the algorithm can better understand the scene,make more intelligent strategic decisions.Compared with the original algorithm,the improved algorithm increased the stable average reward by 9.25%.It effectively solves the problem of excessive exploration space,and finally reduces the impact of sparse reward. |