Font Size: a A A

Research On End-to-end Deep Reinforcement Learning Control Of Intelligent Vehicle Based On PPO Algorithm

Posted on:2022-04-28Degree:MasterType:Thesis
Country:ChinaCandidate:Z W WangFull Text:PDF
GTID:2492306329988629Subject:Vehicle Engineering
Abstract/Summary:PDF Full Text Request
The field of autonomous driving is developing rapidly,and the ways to realize fully autonomous driving mainly include methods based on "perception-decision-control" and machine learning methods based on deep learning and reinforcement learning.Aiming at the generalization problem of end-to-end imitation learning and the instability problem in the early stage of deep reinforcement learning training,this paper combined the two methods to construct an end-to-end deep reinforcement learning automatic driving control model.This model has the advantages of deep understanding of environment,good stability and generalization.The main research contents of this paper include:(1)This paper deeply studies the theoretical basis of deep learning and reinforcement learning,and introduces the PPO algorithm based on the actor-critic framework.On this basis,an autopilot model based on deep reinforcement learning is built,which lays a foundation for the development of end-to-end deep reinforcement learning autopilot control model.(2)This paper introduces the constitution of the state space of the end-to-end control model input of the intelligent car,and the importance of the state space design,and puts forward the extraction method of environmental features based on VAE image feature compression method and YOLOV4 target detection algorithm.It solves the problem that the reinforcement learning state space contains pictures with large dimensions,which leads to slow convergence in the automatic driving task,and the problem that the traffic light information will be lost in the process of image compression.Vae encoder structure can play a role in picture compression,accelerate the convergence of reinforcement learning model,and make the reinforcement learning model meet the real-time needs of the algorithm.However,the compressed hidden vector will lose part of the traffic environment information.Taking traffic lights as an example,this paper uses YOLOV4 target detection algorithm to extract these environmental features,and adds traffic light features to the state space to solve this problem.(3)In this paper,a theoretical model of end-to-end deep reinforcement learning for autonomous driving control based on near-end strategy optimization algorithm is established.The models are mainly divided into state space feature extraction model,PPO reinforcement learning model and environment interaction model.PPO reinforcement learning model can constantly learn update strategies in the environment to achieve the task of automatic driving.The end-to-end deep learning network weights pre-trained by imitation learning can be used in the actor network of deep reinforcement learning,so as to prevent the chaotic driving of vehicles and slow convergence of incorrect movements caused by random initialization of network weights at the initial stage of training.The feature extraction model of state space is integrated from the model mentioned above,which can output concise and complete feature vectors of environment.The design of the reward function in the environmental interaction model takes into account the influences of vehicle speed and direction,collision,lane departure,traffic light passing and global indicator factors,and designs the reward function of the comprehensive influencing factors of the autonomous driving task,so that the agent can quickly learn the autonomous driving strategy.The design of reward function in the environmental interaction model takes into account the influence of vehicle speed and direction,collision,lane departure,traffic light passing and global indicator factors.The reward function of the comprehensive influencing factors of the autonomous driving task is designed so that the agent can quickly learn the autonomous driving strategy.(4)Compared with the mainstream autopilot simulation platform,this paper chose Carla as the autopilot simulation platform for this research,and configured the relevant simulation environment.Four automatic driving tasks with increasing difficulty were designed.Through experiments,it is verified that the end-to-end reinforcement learning automatic driving control model in this paper can accomplish the above tasks well,which proves the feasibility of the automatic driving scheme in this paper.
Keywords/Search Tags:Intelligent vehicle, deep reinforcement learning, Proximal policy optimization algorithm, variational autoencoders, YOLOv4 algorithm
PDF Full Text Request
Related items