Font Size: a A A

Improvement And Application Of Deep Reinforcement Learning Based On Experience Replay Mechanism

Posted on:2022-07-06Degree:MasterType:Thesis
Country:ChinaCandidate:Q ChenFull Text:PDF
GTID:2518306740479414Subject:Applied Mathematics
Abstract/Summary:PDF Full Text Request
The main goal of Reinforcement Learning(RL)is through interaction with the environment,so that the agent(Agent)ultimately obtains the strategy of maximizing the expected return,that is,“learning” a certain ability in the process of constant trial and error.But Due to the complexity of the real environment,traditional reinforcement learning methods cannot converge and apply well in a continuous space.In recent years,with the rapid development of deep learning,research scholars have proposed Deep Reinforcement Learning(DRL))Algorithm,by combining deep learning and reinforcement learning to better solve this problem,in which the Deep Deterministic Policy Gradient algorithm(Deep Deterministic Policy Gradient,DDPG)has a good performance in continuous control problems.Subsequently,Deep Reinforcement Learning has gradually become a hot topic of research at home and abroad,and has been widely used in computer vision,natural language processing,autonomous driving and other fields,and has achieved a series of meaningful research results.Therefore,the research on Deep Reinforcement Learning algorithms is not only important The academic value of and has practical application significance.This article has done the following work for the Deep Reinforcement Learning algorithm,including:First,the research overview of deep neural networks and the research progress at home and abroad are summarized.Second,the Deep Convolutional Neural Networks(DCNN),Markov process(Markov process),and Q Learning algorithm(Q-Learning),Deep Q Network(DQN),Actor-Critic(Actor-Critic),and Deep Deterministic Policy Gradient Algorithm as preliminary knowledge.Aiming at the problems of low experience utilization and slow convergence in the learning process of the deep deterministic policy gradient algorithm.This paper proposes a composite Deep Deterministic Policy Gradient algorithm(Composite Deep Deterministic Policy Gradient,CDDPG),the algorithm The improved Experience Replay Mechanism reduces the relevance of the experience generated by the Agent.At the same time,a composite experience priority calculation method is innovatively proposed,which allows the Agent to consider the following two comprehensively when selecting experience samples Aspects:(1)Immediate return,the reward obtained by the agent at the current moment is more robust;(2)Time Difference error(TDerror),the difference between the action value function at the current moment and the estimated action value function,Its learning effect is easily affected by the outliers of timing error.This effectively improves the efficiency of the use of experience samples,avoids falling into the local optimum,and speeds up the convergence of the network.Subsequently,this paper establishes three reinforcement learning simulations The experimental environment is Inverted Pendulum,Ant and Humanoid.By comparing the numerical results of the experiment,it can be seen that the convergence speed and average return of the Composite Deep Deterministic Policy Gradient algorithm are higher than the traditional method.The simulation results show that the improved algorithm is more efficient And it improves the speed of convergence.The traditional algorithm for mobile robots cannot complete the continuous control task under the condition of no map and sparse input.This paper creates a mobile robot system and proposes a learning-based mapless path planning method,which innovatively uses Composite Deep Deterministic Policy Gradient algorithm only takes the laser sensor of the mobile robot,the relative position of the target and the current speed variable as input,and the action pair in the continuous space as the output.The real scene of the laboratory of the School of Mathematics is modeled by Gazebo Experiments have been carried out.The experimental results show that through the Composite Deep Deterministic Policy Gradient algorithm,the mobile robot can achieve end-to-end training without any artificial features,and prioritize selection based on the ranking of the composite experience Sample.It is finally realized that the mobile robot can complete the path planning task without a map under the “no collision” condition of avoiding obstacles.In theory,the trained mobile robot can be directly applied to any map environment,which proves The effectiveness and superiority of CDDPG in continuous control tasks.Finally,the traditional Deep Deterministic Policy Gradient algorithm and the Composite Deep Deterministic Policy Gradient algorithm are compared,and the experience playback mechanism in the Composite Composite Deep Deterministic Policy Gradient algorithm is summarized.Second,the Composite Composite Deep Deterministic Policy Gradient algorithm based on the mobile robot system is summarized.The path planning method of gradient algorithm,and mention the existing problems and the prospect of future research work.
Keywords/Search Tags:Deep Reinforcement Learning, Deep Deterministic Policy Gradient, Deep Q Network, Experience Replay Mechanism, Path Planning, Autonomous Driving
PDF Full Text Request
Related items