| The overtaking behavior in the traditional driving process is high in complexity and strong in motion continuity.If it is not operated properly,it will cause traffic accidents.The existing unmanned overtaking control technology is mostly based on human driving rules and is implemented by physical modeling.Its model does not have adaptive learning ability,so can’t adjust the control strategy in real time,and is less robust to complex road traffic environment.Therefore,it is particularly important to study the interactive training learning method of the overtaking control strategy model.Based on the idea of DDPG algorithm for deep reinforcement learning,this paper proposes an interactive training learning method for overtaking control strategy of driverless vehicles.The model is pre-trained by collecting the overtaking control data of the intelligent driver,and then the traditionalQ~*algorithm is improved by using the empirical data constraint processing(referred to as"~*Q_EP1 algorithm"),and the overtaking control strategy model is formally trained byQ~*_EP1 algorithm.Compared with the traditionalQ~*algorithm,theQ~*_EP1 algorithm has significantly improved in regard to reducing data size and promoting data quality,but the training efficiency is low.To solve this problem,theQ~*_EP1 algorithm is further improved based on the cluster analysis method,and theQ~*_EP2 algorithm is proposed.The algorithm uses a binary K-means to cluster and process a smaller overtaking control data set to obtain a clustering model M.Then the model M is used to cluster the training data,and according to the principle of equidistance,equal proportion and random sampling from each category that has been clustered,extracting the most representative data is to train overtaking control strategy model.TheQ~*_EP2 algorithm not only reduces the data size,improves the data quality,but also avoids data redundancy.Simulation analysis shows that theQ~*_EP2 algorithm can effectively improve the learning performance of the overtaking control strategy model,including training efficiency,overtaking control performance and generalization ability.In 100 trials,the training time ofQ~*_EP2 algorithm is about 75%shorter than theQ~*_EP1algorithm,and the training efficiency is much higher than the traditionalQ~*algorithm.And this paper completes simulation experiments of 20 cycles(about 88km)on the test road with much higher complexity than the training road,its number of overtaking is 18 times higher than theQ~*_EP1 algorithm,21 times higher than the traditionalQ~*algorithm.In addition,through exploratory research and single factor analysis,the relatively optimal experience pool size,cluster number and sampling ratio are solved,and a simulation experiment until the convergence ofQ~*_EP2algorithm is completed under this configuration condition.The experimental results show that under the relatively optimal parameter configuration conditions,the Q~*_EP2 algorithm has higher training efficiency and faster convergence rate,and its iteration cost of convergence is less than 43.2%of the traditionalQ~*algorithm.TheQ~*_EP2 overtaking control strategy algorithm proposed in this paper has high training efficiency,strong generalization ability,fast convergence speed and good overtaking control performance.It has certain reference significance for the research of unmanned overtaking control. |