With the rapid development of the automotive industry,autonomous driving has become a key research direction.Currently,autonomous driving solutions mainly consist of perception,decision-making,and control,which face issues such as complex strategy formulation,tedious control parameter tuning,and poor environmental adaptability.Deep reinforcement learning(DRL)combines the perception capabilities of deep learning(DL)with the decision-making capabilities of reinforcement learning(RL),and is widely used in autonomous driving research through an end-to-end approach for direct control from raw inputs to outputs.However,DRL algorithms are heavily dependent on the setting of hyperparameters and have poor environmental adaptability.In addition,during the training process,DRL algorithms need to learn better strategies through many trial-and-error processes,making it difficult to ensure vehicle safety and apply in real-world environments.To address these issues,this study built an autonomous driving simulation platform and conducted research on DRL-based autonomous driving control strategies.First,this study built an autonomous driving simulation platform based on the TORCS autonomous driving simulator.Based on the relevant functions of TORCS and the interaction process between the vehicle and the environment,this study designed a vehicle coordinate system,Markov decision process,state space,action space,and reward function.Based on real-world traffic scenarios and algorithm evaluation needs,this study designed relevant autonomous driving simulation tasks and conducted simulation experiments using TD3 and other deep reinforcement learning algorithms.The experimental results show that the built autonomous driving simulation platform can effectively be used for autonomous driving simulation experiments and to evaluate the performance of deep reinforcement learning algorithms.Second,to address the issues of training instability and poor environmental adaptability of the TD3 algorithm in autonomous driving tasks,this study proposed an autonomous driving control strategy based on adaptive deep reinforcement learning.To make the modeling more precise,the interaction process between the agent and the environment was modeled as a partially observable Markov decision process.By combining the recurrent neural network GRU with the Critic network in TD3,a recurrent Critic network structure was designed.Based on the recurrent Critic network,an adaptive Actor-Critic network framework was built to achieve a balance between global control and real-time control effects.Based on the adaptive Actor-Critic network framework,combined with the TD3 algorithm,the TAD3 algorithm was proposed.Compared with other algorithms,simulation experiments show that the proposed TAD3 algorithm has better performance superiority and environmental adaptability.Finally,to address the issue of poor vehicle safety caused by the need for the agent to try a lot during training to learn driving strategies,this study proposed the PPO-ETMDPBarrier algorithm,which combines safety reinforcement learning,control theory,and Markov decision processes with early termination.The proposed algorithm uses safety reinforcement learning methods to design safety constraints,uses obstacle validation methods in control theory to restrict the agent to safety constraints,and uses Markov decision processes with early termination to improve the quality and efficiency of agent sampling.Through simulation experiments comparing the PPO-ETMDP-Barrier algorithm with other algorithms,this study demonstrates that the proposed algorithm not only has high autonomous driving performance,but also improves vehicle safety. |