| In recent years,various artificial intelligence-based applications have gradually achieved better results than the original methods in more and more fields.Among them,autonomous driving has gradually become one of the most important landing scenarios in the field of artificial intelligence.The use of autonomous driving technology in vehicles can avoid potential safety hazards caused by driver errors,and can greatly improve comfort and intelligence.The current autonomous driving decision-making method mainly adopts the rule-based method,which establishes the behavior rules during the driving process of the intelligent vehicle through the preset "expert knowledge".However,for the complex scene of highway autonomous driving with large state space,continuous action space and fast speed,this method is difficult to meet the driving needs.Reinforcement learning is an experience-driven autonomous learning method,which allows the agent to obtain the optimal strategy for completing the task through continuous interaction with the environment to "trial and error" and obtain feedback,which can be modeled as a Markov decision process.It has a wide range of applications in engineering problems.At the same time,deep reinforcement learning combines the powerful function fitting ability of deep learning with the decision-making ability of reinforcement learning,which provides new solutions to complex problems.Therefore,it can be used as one of the feasible solutions to solve the autonomous driving of intelligent cars in highway scenarios.Lane following and autonomous overtaking are two typical working conditions of highway autonomous driving.In this paper,deep reinforcement learning technology is applied to the decision-making modules of these two working conditions.The main research contents are as follows:First,the deep reinforcement learning algorithm for highway autonomous driving decision-making is improved.The two deep reinforcement learning algorithms currently commonly used in autonomous driving decision-making,Deep Deterministic Policy Gradient(DDPG)and Proximal Policy Optimization(PPO),are improved to make them more suitable for decision-making modules in highway autonomous driving scenarios.For the DDPG algorithm,this paper makes targeted improvements and proposes a Deep Deterministic Policy Gradient based on the dual critic and priority replay mechanism(Double Critic and Priority Experience Replay Deep Deterministic Policy Gradient,DCPER-DDPG)algorithm.A dual-critic network is used to optimize the driving strategy effect caused by over-estimation of the Q value.For the time difference error generated when the actor network is updated,the algorithm model is inaccurate and the delayed update method is used to reduce this effect.In view of the unsatisfactory sampling effect caused by random experience replay in the DDPG algorithm and the computational power and resource consumption caused by slow training speed,this paper adopts the priority experience replay mechanism to improve it.For the PPO algorithm,this paper introduces the curiosity mechanism to improve the efficiency of autonomous vehicles in exploring the environment and changes the network update method from gradient ascent to RMSProp to train the agent more fully,and proposes Proximal Policy Optimization based on the curiosity mechanism and RMSProp(Curiosity and RMSProp Proximal Policy Optimization,CR-PPO)algorithm.Secondly,the lane following decision in the highway scene is modeled and verified in the simulation system.According to the task requirements,TORCS is selected as the simulation environment,the state space and action space are selected,and the reward function is designed.Then,the actor and critic network structures adopted by the two algorithms are designed respectively.Finally,an experiment is designed to verify the decision-making module of the lane keeping system.Finally,the autonomous overtaking decision in the highway scene is modeled and verified in the simulation system.According to the working conditions of highway autonomous overtaking,highway-env is selected as the simulation environment,and then the state space and action space of the reinforcement learning algorithm are defined,and the reward function is designed for safety,efficiency and comfort.Then the neural network structure of the two algorithms is designed.Finally,according to the task requirements,experiments are designed to verify the feasibility of applying the reinforcement learning algorithm to the decision-making module for autonomous overtaking in the highway and compare the performance of the two algorithms. |