| Although the technology of autonomous driving has made great progress over the last few years,it was commercialized in some specific closed and low speed scenarios.However,how to improve the safety and personification of the autonomous vehicles and the efficiency of transportation have not been completely solved in the complex traffic scenarios,such as congested on-ramp and roundabout.In order to improve the comfort,the safety of autonomous vehicles,the traffic efficiency and the rate of lane-merging successfully on ramp,the following researches are done in this paper.Firstly,an improved reward function in Reinforcement Learning algorithm TD3 is proposed.The defects of the reward functions from the existing researches on lane-merging policy based on Reinforcement Learning that the reward functions have difficulty balancing successful lane-merging,the safety,the comfort and the traffic efficiency and are so static that evaluate policy inaccurately are analysed,each term of the reward function are normalized and weighted to make the orders of magnitude consistent,and a dynamic reward function with respect to the speed and human driver perception reaction time is designed.The results show that the evaluation of the improved dynamic reward function is more accurate,which makes the learned lane-merging policy have better generalization on rate of successful lane-merging,the safety,the comfort and the traffic efficiency.Secondly,a lane-merging policy based on the improved TD3 is proposed.The analysis verifies that the state-action value function Q is underestimated in TD3 algorithm,and the underestimation should be reduced.The model uncertainty reflects the confidence level of the estimation,and therefore the sample variance of the estimator of Q representing the model uncertainty can be used to take a weighted average of the unbiased sample mean,and an improved TD3 with the way to improve the estimator of Q is proposed.The results show that the estimator of Q in the improved TD3 is more accurate than the vanilla TD3.In terms of rate of successful lane-merging,the safety,the comfort and the traffic efficiency,the generalization of the lane-merging policy learned through the improved TD3 is better than the one learned through the vanilla TD3.Finally,a lane-merging policy is learned based on offline Reinforcement Learning.Distribution shift between the behavior policy and the target policy is analyzed,and effectiveness of the ways that support constraint and subtracting the quantitative model uncertainty from the Q estimation is verified.The inequality constrained optimization with support constraints is transformed into Lagrange dual problem,and a lane-merging policy is learned through offline Reinforcement Learning algorithm built on improved TD3.The test shows that compared with the lane-merging policy learned through imitation learning,the one learned through offline reinforcement learning algorithm have better generalization on rate of successful lane-merging,the safety,the comfort and the traffic efficiency. |