With the popularity of automobiles,travel has become increasingly convenient,but the number of traffic accidents has also increased.Artificial intelligence is regarded as an effective solution to prevent traffic accidents caused by human factors.However,developing an unmanned system that can handle all kinds of conditions is still a very challenging problem in the world.In this thesis,Deep Deterministic Policy Gradient algorithm is used as the original algorithm to train the driving strategy.In this case,the algorithm can get the perception of current environment based only on the distance,running speed,rotating speed data but without the image data to realize the control of throttle,brake and steering wheel.Through the analysis of the model and the observation of training process,this paper proposes three improved methods for the algorithm.First,through the evaluation of the target network,it is found that the target network update has the problem of uncertainty,which makes the performance of the target strategy network unstable during the training process.In order to relief this problem,a simulated annealing method is proposed to update target policy network.Second,in order to relief the high variance of Q-value estimation due to the estimation error of the strategic network,based on the idea that similar actions will obtain similar rewards,adding Gaussian noise method is proposed to smooth it.Third,according to the trace of reward changes in the training process,it is found that the original reward function sometimes can not give a certain reward to the car.The reward function is improved to solve the problem.Fourth,based on the method of adding the noise to the model parameters to explore,the method of adding the Gaussian noise to the parameters of each layer of the online strategy network to realize the exploration of the suboptimal solution is posed.The model training process is carried out based on TORCS platform.Experimental results show that the four improved methods proposed in this paper can effectively improve algorithm control performance. |