Autonomous driving or self-driving can enhance driving safety and improve traffic efficiency,which has become a global development trend.While existing driving strategies are mainly based on manually designed rules and models,and cannot deal with complex scenarios or unexpected accidents.Deep reinforcement learning is currently a cutting-edge research direction in the field of artificial intelligence.It is a goal-oriented autonomous learning method that has achieved amazing results in many complex tasks and is considered a key technology to general artificial intelligence.The purpose of this thesis is to use deep reinforcement learning algorithms to learn more intelligent driving strategies based on virtual environments and to assist in the training and testing of autonomous vehicles in real scenarios.The main research content of this thesis is as follows.First of all,this thesis builds a virtual environment based on the existing autonomous vehicle virtual simulation platform for intensive learning and training.This virtual environment implements a universal interface based on the Open AI Gym environment,and can simulate complex traffic conditions in real scenarios.Secondly,this thesis proposes DBSAC(Dual Buffer Soft Actor-Critic)algorithm,which is an improved SAC algorithm based on dual experience replay buffers.Next,for autonomous driving tasks,this thesis proposes an autonomous learning framework with deep reinforcement learning as the core.Under this framework,this thesis designs the state space,action space,reward function and neural network structure of deep reinforcement learning.The vehicle self-attention neural network structure designed in this thesis uses a multi-head self-attention mechanism and encoder-decoder architecture.Finally,based on the virtual simulation environment,this thesis reasonably designed two autonomous driving scenarios: a straight-going scenario and an unprotected intersection scenario,and conducts corresponding experiments and analysis.This thesis counts the corresponding quantitative indexes during the experiment,and the results show that: in both scenarios,based on the autonomous learning framework can learn effective strategies to complete the tasks,and the learned driving strategy performs better than the traditional driving strategy? in both scenarios,the convergence speed of the DBSAC algorithm is better than that of the original SAC algorithm? in the scene of unprotected intersections,using the vehicle self-attention network on the one hand speeds up the convergence speed of the strategy,and on the other hand improves the performance level of the strategy. |