Rapidly developing autonomous driving technology has brought new ways to reduce safety accidents and alleviate traffic congestion.The behavior decision-making subsystem is an important component of autonomous driving systems and being able to demonstrate decision-making abilities similar to a driver is an important embodiment of highly intelligent cars.Rule-based decision-making solutions that have been implemented are limited by the designer’s prior knowledge and cannot cover complex and ever-changing traffic scenarios.Currently,deep reinforcement learning is increasingly being applied to the field of autonomous driving,providing new ideas for the design of decision-making systems.This article combines the advantages of reinforcement learning in interacting with iterative optimal strategies in the environment and the tremendous representation power of deep neural networks to primarily study its application in decision-making systems.Regarding the current research on deep reinforcement learning in the field of autonomous driving,this article designs a behavior decision-making system using the modular autonomous driving solution,based on the Soft Actor-Critic(SAC)reinforcement learning algorithm framework,which has the advantages of rapid convergence and robustness.Compared to many "end-to-end" research solutions,modular systems are easier to maintain and debug,while the model’s decision-making strategy is still planned and then executed to ensure the rationality and executability of actions.The simulation environment is the basis of deep reinforcement learning strategy iteration.By determining the software architecture,writing upstream and downstream module algorithms,the simulation environment is built to realize the interaction and improvement of intelligent agents in the environment.In this paper,Pre Scan software is used to construct traffic scenes,Car Sim vehicle model is used,and the overall interactive simulation environment is built by using Matlab/Simulink.The traffic flow plug-in is used to set up intelligent traffic flow to ensure the difference of the environment in the training round.Automatic driving decision-making upstream and downstream including perception and planning control modules are established by combining software interface and writing algorithms.The difficulty of this paper is to realize the docking between decision-making intelligent agents and planning modules and to ensure the effective execution of decisionmaking results.In order to achieve this goal,the decision-making and planning modules adopt the form of path-speed decoupling.A path decision-planning,speed decision-planning solution is developed based on the Frenet coordinate system,and the decision-making strategy is processed to ensure that it can obtain an executable trajectory after planning.Building SAC vehicle decision-making intelligent agents,verifying their soundness through simulation experiments,and introducing Long Short-term Memory(LSTM)neural networks to explore their impact on decision-making performance.This paper takes the motion information of the host vehicle and surrounding lane vehicles in the Frenet coordinate system as the state space,lane recommendation and lateral offset and speed within the lane as the action space.Polynomial is used to construct the path and velocity curve,and a convex space meeting the planning requirements is built based on the results.The design of the reward function fully considers safety,efficiency and comfort,with parameters such as following distance and longitudinal and lateral acceleration change rate as reward function variables.In order to improve the rationality of the decision-making agent’s decision results,this paper introduces the difference between the decision and planning results in terms of path and velocity into the reward function and feeds it back into the policy improvement process to reduce planning difficulty.To compare the differences in strategy between different models,simulation experiments are designed.The intelligent driver model(IDM)and the lane-changing model based on Gipps’ safe distance are used as baseline models.With metrics comparison in specific scenarios and statistical analysis of mixed-scenario simulation tests,strategy differences are found.The vehicle decision-making intelligent agent studied in this paper shows advantages in safety and efficiency.In particular,lateral decision-making within the lane increases flexibility in handling obstacle scenarios,but is limited in continuity and comfort due to algorithmic limitations.The performance of the SAC vehicle decisionmaking intelligent agent with the introduction of LSTM in the same scenario is compared,and it is found that it improves safety and comfort to a certain extent compared to the SAC vehicle decision-making intelligent agent. |