As one of the important components of intelligent transportation systems,intelligent vehicles are a hot topic of current research in the field of vehicle engineering as they help alleviate traffic congestion and reduce traffic accidents.Among the various technologies equipped with intelligent vehicles,behavior decision making of driverless vehicles is one of the key technologies of autonomous driving,which plays a decisive role in the driving safety and overall vehicle performance of vehicles.Among the various behavior decision methods,the behavior decision method based on meta-reinforcement learning has the advantages of high learning efficiency and good robustness,and has important research value.The current metareinforcement learning algorithm for behavior decision making of unmanned vehicles requires the calculation of the second order derivatives of the loss function,which is computationally intensive.To address the above problems,this paper combines the Reptile first-order metareinforcement learning algorithm with the proximal policy optimization reinforcement learning algorithm,and proposes the Meta-PPO meta-reinforcement learning algorithm,and applies it to driverless vehicle behavior decision making.The specific research contents of this paper are as follows.(1)The Meta Proximal Policy Optimization meta-reinforcement learning algorithm is proposed.In this paper,we combine the Proximal Policy Optimization algorithm with the Reptile first-order meta-learning algorithm to propose the Meta-PPO meta-reinforcement learning algorithm.The innovation of this algorithm is that we combine the Reptile metalearning algorithm on top of the original PPO algorithm,using the Reptile meta-learning algorithm to find a good initial parameter for the model,thus reducing the time required for the model to learn a new task,while not involving the computation of second-order derivatives and reducing the computational overhead.(2)A Meta-PPO-based approach to driverless vehicle behavior decision making is investigated.Aiming at the behavior decision problem in the absence of other obstacles such as pedestrians and vehicles on the road,an unmanned driving decision method based on the MetaPPO algorithm is designed,which can directly output action outputs such as acceleration and deceleration based on the numerical inputs of sensors such as speed sensors and distance sensors,and perform end-to-end decision control of the unmanned vehicle behavior.Experimental results in the autonomous driving simulation platform show that the Meta-PPO-based decision making method converges better than the traditional PPO algorithm,and the vehicle can run a full course on the training track.In addition,the unmanned vehicle with the Meta-PPO algorithm was also able to complete a full lap on a test track with greater curvature and higher difficulty,with good generalization.(3)A reinforcement learning algorithm-based decision making method for unmanned behavior in a multi-vehicle environment is investigated.A multi-vehicle unmanned decisionmaking method based on the PPO reinforcement learning algorithm is proposed to address the decision-making problem in the presence of multiple unmanned vehicles on the road,where a centralised policy network is trained to make decisions on the behavior of all unmanned vehicles.However,this method cannot solve the problem of non-smooth environment due to multiple unmanned vehicles learning at the same time.This paper then proposes a multiintelligent body proximal policy optimization algorithm based on the proximal policy optimization algorithm,designs a multi-vehicle unmanned decision-making model based on multi-intelligent body proximal policy optimization,and verifies the effectiveness of the method through experiments in an autonomous driving simulation platform.This paper investigates the behavior decision problem in two different scenarios: singlevehicle and multi-vehicle environments.The Meta-PPO meta-reinforcement learning algorithm is proposed and a single-vehicle behavior decision model based on Meta-PPO is developed for the unmanned decision problem in single-vehicle environments.For the decision making problem in multi-vehicle environment,a multi-vehicle unmanned decision model based on proximal policy optimization algorithm is proposed and a multi-vehicle decision model based on multi-intelligent proximal policy optimization is developed.Finally,simulation experiments are carried out in the Torcs autonomous driving simulation platform to verify the effectiveness of the model. |