| With the rapid development of economy in recent decades,the progress of urbanization in China has been obviously accelerated.While the level of personal car ownership in China continues to go up,air pollution problems and urban road traffic safety challenges come together.In the past few years,autonomous driving has been regarded as a potential technology to handle these problems.However,in the field of traditional autonomous driving,the development and debugging of an autonomous driving system still face the problems of high hardware cost,long development cycle and complex process of vehicle modification.The traditional vehicle decision control algorithm is still based on manual-designed driving strategy,suffering from inflexible and complex design process.As a exploratory technology in the field of decision making and control,deep reinforcement learning(DRL)has great potential in autonomous driving.Instead of making artificial rules for driving system,DRL package a vehicle as an agent to interact with environment,and then learn driving strategy itself.Such a new method could greatly simplify the complex process of development of traditional autonomous driving.Therefore,the end-to-end decision-making algorithm for autonomous driving based on deep reinforcement learning is studied in thesis.First,in view of the challenges of long development period,slow testing process and high deployment cost of traditional autonomous driving,the experiment for vehicle driving strategy testing is held in simulator.By researching on the interaction mechanism between the simulation environment and the vehicle agent,an end-to-end framework named CARLA_RL is designed to simplify the interaction process between simulator and agent,and a training environment is created for vehicle agent to interact by the way.Second,In view of the long learning cycle and slow convergence of traditional reinforcement learning,the fusion scheme of imitation data of human drivers and reinforcement learning is studied.The imitation learning human driving data is used to optimize the traditional reinforcement learning method.In thesis,a new replay buffer memory for human data is designed on the basis of the traditional DDPG algorithm,and warm-up process is introduced to help agent explore the environment faster.Hence,the new algorithm IDDPG has a remarkable effect on accelerating the early selflearning process of the agent.Third,To solve the problem of low utilization rate of interactive data in the replay buffer memory in traditional reinforcement learning algorithms,the priority saving method for high-reward data is studied,the concept of priority saving method and acceleration state of replay buffer memory is proposed in thesis.By improving the distribution of interaction data with relatively high reward value in the pool at the later stage of interaction between agent and environment,the agent could improve the utilization rate of excellent data samples.Finally,the proposed algorithm and agent are developed in self-designed end-toend framework CARLA_RL.The training and testing process are finished in CARLA according to its autonomous driving benchmark CoRL2017.The results show that the proposed algorithm IDDPG has better learning rate and reward distribution,and perform better than the standard baseline of CoRL2017. |