| Modern society puts forward higher and higher requirements for the function and efficiency of transportation.The uncontrollability of the participants’ behavior intention and other reasons,reasonable planning of traffic and the realization of automatic driving will inevitably become a difficulty.The importance of modern transportation for economic development and social progress is self-evident,so a wide range of scholars around the world have carried out in-depth research on the problem of automatic driving,and their scientific exploration has also been supported by funds and people from all walks of life.In order to solve this problem,researchers have proposed a variety of control strategies,which can be divided into traditional,rule-based control methods and artificial intelligence methods.The traditional control strategy is based on rule design and pays attention to the interpretability of the principle,but sometimes it will complicate the simple problem and inevitably lead to the over constraint problem,which is facing many difficulties and challenges in the current application.In other words,traditional methods are difficult to apply to complex scenes,which are necessary for automatic driving.Applying deep reinforcement learning method to this field can solve this problem well.It shows excellent performance in solving sequential decisionmaking problems,and is conducive to application in complex scenes.Its basic principle is similar to children’s toddler,which can enable children who do not understand complex principles to complete more complex control process.Therefore,deep reinforcement learning method is more and more widely used in vehicle control,especially in the field of automatic driving.In this paper,the algorithm is used to solve the adaptive cruise problem of semitrailer in straight and curve,and the driving stability of the vehicle is also considered.This paper can be divided into the following parts:1.Load model construction and optimization.The three axle semi-trailer is segmented to separate the fracture point of the two parts,which is also the position of the tail of the tractor.The segmented mode is vertical disconnection,so that the first part has the whole of the tractor and the front end of the semi-trailer,and the main body of the second part is the middle and rear end of the semi-trailer,which avoids the common problem of over constraint in mechanical analysis.The mechanical change of each section is analyzed in detail,and the load change process is divided into static load and dynamic load.Firstly,the load of the second section,that is,the rear axle of the trailer,is calculated,and then the load value of the front and rear axle of the tractor in the first section is derived.The LTR value is used as the judgment basis to determine whether the vehicle is about to be dangerous and whether the danger trend is obvious.Considering that the sensitivity of tractor and trailer to roll is different,it is necessary to judge the process according to different thresholds.Because the centroid position parameters needed in the model are not easy to obtain,and there will be small changes along with the motion process of the vehicle,it is necessary to obtain a more accurate value through parameter identification to improve the operation accuracy and obtain more accurate load value.In this paper,the filtering method is volume infinite Kalman filter.Firstly,the dynamics of the semitrailer is analyzed,and its motion equation and driving expression are established.Then,the parameters of the centroid position are expressed according to these equations.Then,according to the more accurate centroid state value in Truck Sim as reference,the more accurate and stable convergence centroid height and horizontal distance from centroid to the first axle of semi-trailer are calculated.In the process of adjusting the super parameters,the noise of the sensor and the noise of the centroid derived equation are estimated according to the experience.Considering that the mechanical analysis process of the load model is relatively simple and easy to have a certain degree of systematic error,this paper makes a semi empirical correction according to the characteristics of the model.The main reason for the correction is that the amplification of the rear half of the semi-trailer results in a large load error of the two rear axles of the vehicle.The damping and stiffness characteristics of semi-trailer suspension have complex influence on the vertical load force.Therefore,the anti roll stabilizer bar is used to adjust the force generated by the anti roll stabilizer bar to compensate the load of the rear two shafts.At the same time,in order to make the fitting effect of the load better,this paper also uses the roll angle and roll acceleration of the tractor and trailer parts for quartic polynomial fitting,so that the accuracy of the three axle semi-trailer load model is almost close to the value in the simulation software.2.Construction of vehicle scene simulation environment.Drivers with different driving styles have different expected acceleration and yaw rate,so their range of action space is different.Through the multi-dimensional driving style scale,three typical styles of drivers are distinguished.Through the driver information collection and theoretical derivation,the action space and state space are scaled,which speeds up the training speed of the agent,and the DDPG network structure is initially constructed.Combined with the specific control requirements,the state space is reduced.On the premise of retaining features,the state part with weak correlation is removed,which indirectly accelerates the process of reinforcement learning network.The two agents are trained separately.The different state spaces of longitudinal control and lateral control are discussed respectively.The principle of lane recognition and the reference of safe distance model are explained.The two agents are trained separately.The different state spaces of longitudinal control and lateral control are discussed respectively.The principle of lane recognition and the reference of safe distance model are explained.A reasonable training lane is designed.The lanes selected in this paper include straight roads with appropriate length and curved roads with medium curvature,which is more conducive to the verification of DRL algorithm.3.Adaptive cruise control based on deep reinforcement learning.The design concept and practical application of reward function of three axle semi-trailer vehicle based on control objective under ACC condition are elaborated,and the update function of DDPG network is established.For the set driving conditions,the control objectives of the enhanced ACC mainly include four points:(1)heavy semitrailer can keep driving on the straight lane.(2)Turning is realized on the premise of considering the stability on the curved road.(3)Follow when there is an obstacle ahead.(4)Drive according to the set speed when there is no obstacle.The control mode is mainly realized by updating the parameters in DDPG network,that is,when a state is input,the intelligence will output a corresponding best action at this time to achieve the control goal.According to the control objective of this paper,the reward function is set to make the vehicle pursue the best reward,and the control process is realized by this way.In order to speed up the agent training process,this paper has done four parts: the first part,through collecting the driver’s driving information on the three axle semitrailer test bench,reduces the value range of action space,and directly reduces the possibility of the agent randomly exploring the unreasonable action space.In the second part,the state space is scaled,which is mainly due to the relationship between the various dimensions of the state space,resulting in a large part of the state does not appear at all.There is a strong correlation between some states,and the state characteristics are not obvious.This process reduces the range of input value and directly reduces the complexity of training.In the third part,an agent that outputs twodimensional actions is divided into two agents that output one-dimensional actions.After the first agent is trained,the parameters in the deep neural network are stored and imported into the second training environment to train the second agent.After the above training process,the agent is used in the subsequent experimental verification process.In the fourth part,in the updating process,the capacity of the sample pool in the training process of the later stage is changed in the DDPG network.That is to say,in the early stage of training,samples with smaller capacity are extracted from the sample pool.At this time,the correlation between the States is strong,and there is a greater possibility that they are useless probes.In the later stage of training,the agent has learned to explore better,At this time,more exploration is more effective.Expanding the sample space can improve the efficiency of updating and learning.In reinforcement learning algorithm,the control target of agent is abstractly expressed as a special signal,which is called reward.It transmits the information from environment to algorithm and agent.At every moment,the return is a single scalar value.Generally,the control goal of an agent is to maximize the total reward he can get.This means that the whole process needs to maximize not the current revenue,but the long-term cumulative reward.We can express this informal idea clearly as a revenue hypothesis.It can be said that one of the most significant features of reinforcement learning is to use the size of reward value to embody representation.Agents always want to maximize the benefits.If we want it to do something for us,the way we provide revenue must make the agent achieve our goal while maximizing revenue.Therefore,it is very important that our reward function can really show our goal.In the framework of reinforcement learning algorithm,the agent can only learn how to interact with the environment according to the definition of reward function,so the design of reward function directly determines the control effect of the agent.The reward function needs to define the reward and punishment of the corresponding behavior under different driving conditions,but few people consider the vehicle stability from the perspective of vehicle system dynamics.Based on the load transfer model of the three axle semi-trailer,this paper analyzes the stability of the semitrailer,and comprehensively considers the incentive factors such as driving efficiency,driving safety and driving stability.In order to achieve the control objective of adaptive cruise of three axle semi-trailer on straight road and curved road,the following items of reward function are designed:(1)distance deviation penalty(2)speed reward and over speed penalty(3)large steering angle penalty(4)roll stability penalty(5)dynamic safety distance penalty(6)termination penalty.Based on the previous experience and experiments,this paper designs the reward function according to the requirements,these items basically cover and achieve the vehicle control objectives.A more accurate verification process is carried out in the next chapter.4.Semi trailer adaptive cruise control experiment verification.The experimental results of the control target are analyzed and sorted out.By changing the environment test,that is,under the different conditions of constant high speed,constant low speed and variable speed of obstacle vehicle,the superiority of heavy vehicle decision-making strategy based on DDPG is verified,and it is proved that the control effect is good in terms of lane keeping,safe distance from the front vehicle and roll stability at highspeed cornering. |