Font Size: a A A

Research On Command Decision Method Based On Deep Reinforcement Learning

Posted on:2021-04-12Degree:MasterType:Thesis
Country:ChinaCandidate:M Q YuanFull Text:PDF
GTID:2518306047482164Subject:Software engineering
Abstract/Summary:PDF Full Text Request
Deep reinforcement learning is currently a new hot direction in the field of Artificial Intelligence research.It combines the perceptual ability of deep learning with the decision-making ability of reinforcement learning to achieve direct control from original input to output in an end-to-end form.Since its inception,it has made substantial breakthroughs in many decision-making control and many tasks that require the perception of high-dimensional raw input data,especially in the field of command decision-making.However,the current deep reinforcement learning algorithms In the process,there are problems such as low data utilization,unstable policies,and being trapped in a local optimum in a deceptive or sparse environment.Aiming at the above problems,the SOBTPER-DDGBES DQN of deep reinforcement learning algorithm is proposed in this thesis.In terms of empirical playback algorithms,"Second Order Backward Transfer Priority Experience Replay empirical playback method" is proposed for the current algorithm data utilization rate and low policy quality in this thesis.The first and second priorities are constructed by the cumulative reward value of the sample sequence and the TD-error value of the sample data,and the priority of each sample data is attenuated forward and backward.This method improves the data utilization rate from the perspectives of sample sequence cumulative reward and priority reverse attenuation propagation.In terms of exploration and utilization strategies,this thesis proposes a "Based on Diversity-Driven and Greedy Boltzmann Exploration Strategy" for the problems of slow convergence and poor stability caused by existing strategies.The method can significantly enhance the agent's ability to search the environment.Secondly,an adaptive strategy is used according to the rewards obtained by the Agent,so that ? can be dynamically adjusted as the Agent learns,solving the problem of balance between exploration and utilization.Then based on the adaptive strategy,the Boltzmann strategy is introduced to solve the problem that the action of equal probability affects the speed and efficiency of the algorithm.Finally,based on the DQN algorithm,a SOBTPER-DDGBES DQN deep reinforcement learning algorithm is proposed.Comparative experiments show that the SOBTPER empirical playback method proposed in this thesis has significantly improved the acquisition of high-quality strategies and data utilization rate compared to previous methods in the command decision-making process.In terms of exploration and utilization methods,compared with the previous exploration and utilization strategies,the advantages of the DDGBES exploration and utilization method in command and decision-making activities are balanced.Finally,a comparison experiment with existing deep reinforcement learning algorithms has verified that the SOBTPER-DDGBES DQN algorithm has excellent comprehensive performance in command decision-making activities,which can greatly improve the quality and speed of the strategy,and make the command decision-making activities more scientific and reasonable.
Keywords/Search Tags:Experience Replay, Exploration Strategy, Command Decision, Deep Reinforcement Learning
PDF Full Text Request
Related items