Font Size: a A A

Research On Automatic Driving Control Decision Based On Deep Reinforcement Learning

Posted on:2023-10-15Degree:MasterType:Thesis
Country:ChinaCandidate:Q WangFull Text:PDF
GTID:2568306773459904Subject:Master of Engineering
Abstract/Summary:PDF Full Text Request
With the popularization of automobiles and the rapid development of industrial Internet,5g and other technologies,the research on automatic driving technology is becoming more and more popular.The core of realizing automatic driving is to make correct decisions quickly according to the current vehicle status.Deep reinforcement learning(DRL)is a process in which agents interact with the environment,receive rewards through feedback,and then make the next action decision,which goes back and forth,Finally,the goal of independent decision-making will be realized.In this paper,the deep reinforcement learning algorithm is applied to the research of automatic driving control decision-making,and the improved deep reinforcement learning algorithm is verified by TORCS simulation platform.The main work is as follows:Firstly,aiming at the defects of poor stability and difficult convergence of Proximal Policy Optimization(PPO)in training.This paper analyzes the adaptive divergence KL-PPO,which affects the stability of KL-PPO policy update due to the asymmetry of KL divergence.In order to solve the negative impact brought by this asymmetry,Correntropy Induced Metric Proximal Policy Optimization(CIM-PPO)algorithm based on correlation entropy induced metric is proposed.The measurement method of correlation entropy theory can better characterize the difference between the old and new strategies,so that the strategy can be updated more accurately,and the impact of asymmetry can be improved.Then,for the PPO algorithm,the experience samples in the experience playback body are randomly sampled during the training process,and there are defects such as slow convergence speed.This paper studies experience replay as well as priority trajectory replay,and redesigns reward-based priority operations.At the same time,in order to prevent the difference of high square,a Priority Trajectory Replay(PTR)with truncated importance sampling is proposed to improve the CIM-PPO algorithm.In order to speed up the acquisition of more groups of trajectory experience,the Learner-Actor architecture is used to interact with multiple groups of environments in parallel to speed up the sampling efficiency of historical experience.The algorithm improves the convergence speed through empirical learning of trajectories with high sampling priority,and conducts experiments with multiple experiments on the Open AI platform and multiple comparison algorithms to test the effectiveness of the improved algorithm.Finally,the improved deep reinforcement learning algorithm is applied to the lane keeping task of autonomous driving control decision-making task,and the experiment is carried out in the The Open Racing Car Simulator(TORCS)simulation environment.Through the analysis of each index of the experimental results,the effectiveness of the above improved algorithm in the task of autonomous driving control decision-making vehicle keeping is verified.
Keywords/Search Tags:Deep Reinforcement Learning, Proximal Policy Optimization, Autonomous Driving, Correntropy Induced Metric, Prioritized Trajectory Replay
PDF Full Text Request
Related items