With the development of automobile technology,the problems of environmental pollution and traffic safety in modern society have become increasingly serious.In this context,the intelligent vehicles with safe autonomous driving system has become a hot development direction,Among various sophisticated technologies of the intelligent vehicles,the control of unmanned vehicles plays a critical role in vehicle safety and is a problem that people are more concerned about.Traditional control methods have certain limitations.When the environment is excessively complex,the controller parameters will be correspondingly complex.Moreover,the parameters previously defined should be adjusted accordingly,while the vehicle’s environment changes,However,the parameters are often difficult to adjust adaptively.Deep reinforcement learning algorithm has a demonstrated excellent intelligent control effect,which can achieve end-to-end control,that is more suitable for the control of unmanned vehicle.Therefore based on deep reinforcement learning control algorithm of vehicle this article,first applies deep deterministic policy strategy to virtual vehicle control,and classifies the training data,with different probability distribution,to improve the training speed.The main work of this paper are as follows:(1)Deep Deterministic Policy Gradient algorithm is applied to realize the control of virtual vehicle.The algorithm uses high-dimensional perception data as the input,As the data are calculated by the neural network the output value of the control action.is then converted to the action command and consequently executed by the virtual car,Through additional information interactions with the environment,better strategy will be obtained via reinforcement learning.This paper analyzes the design methods of reward function in different scenarios and proposes the reward function appropriate for the scenario in this paper.(2)There may be a lot of trial and error processes in the training of the Deep Deterministic Policy Gradient algorithm.In order to reduce the virtual vehicle trial-and-error behaviors,and simultaneously improve the training speed,the training data are classified and processed in this paper,Good data that is beneficial to the vehicle goals are stored in the good data block,while the other data are left in the bad data block.During the training,data are extracted with different probabilities for training,thus increasing the speed.(3)TORCS platform is utilized for experimental simulation.The experimental results show that the designed reward function has favorable convergence,verifying that the classification and processing of different training data could improve the training speed and reduce the trial-and-error behaviors of the virtual vehicle in the learning process. |