| In recent years,with the rapid growth of car ownership,expressways,as a fast-moving channel,have met the needs of people for convenient travel.However,at the same time,traffic safety issues have also attracted widespread attention,among which the identification and prediction of safety risks is a key point.In the existing expressway traffic safety risk research,one is based on historical accident data for passive safety management after the accident,and the other is to consider the influencing factors before the accident,mainly for a single traffic flow data analysis.With the development of data collection technology,big data provides the possibility for proactive safety prevention and control in advance.Based on accident data,traffic operation and driving behavior big data,this thesis takes a expressway in Shandong Province as an example to carry out data-driven traffic safety risk identification and prediction research,and reveals the operation and driving rules behind the accident.Firstly,the collected expressway accident data is cleaned,integrated,transformed and reduced.The statistical analysis method and GIS technology are used to analyze the time distribution characteristics,spatial distribution characteristics,accident form,accident level,accident type and weather distribution characteristics of traffic accidents.The results show that the distribution of accidents in each quarter is relatively average.The risk of accidents is high in October and Friday weekends.There are more accidents in the first lane and emergency lane.The proportion of minor accidents on expressways is the largest.The main accident forms are crashing guardrails and rear-end accidents.Secondly,through the methods of social network analysis,Apriori algorithm of association rules and visualization,the hierarchical association rules mining of accident risk is carried out from macro to micro,and the association rules output by Apriori algorithm are analyzed by clustering and multi-dimensional interaction.The results show that the attributes of road factors related to pile number and lane have a great influence on traffic safety risk.Small passenger cars,the first lane and sunny days are the key risk factors affecting expressway traffic accidents.Impact guardrails and rear-end accidents are the main accident forms.There are curved roads and many interchanges in the pile number interval,which are identified as accident risk sections.Thirdly,based on the fusion of traffic operation data,driving behavior data and accident data,a multi-source heterogeneous database is established.The multi-source data labels are unified through pile number calibration,latitude and longitude mapping,etc.,and the database with time granularity of 1h and spatial granularity of 1km is established by using the layer superposition technology in GIS.The characteristic variables are screened by Pearson correlation coefficient method,and three classification prediction algorithms are selected.The results show that different types of multi-source data can be fused through GIS to form a database;nine characteristic variables,including sharp left turn,sharp right turn,sharp merge into the left lane,sharp merge into the right lane,sharp acceleration,sharp deceleration,congestion index,average speed and speed variation coefficient,are selected as model input variables.Three classification prediction algorithms,XGBoost algorithm,Logistic regression algorithm and support vector machine algorithm,are selected as model modeling methods.Then,XGBoost algorithm,Logistic regression algorithm and support vector machine algorithm are used to construct six kinds of expressway traffic safety risk prediction models of group A based on traffic operation data and group B based on traffic operation and dangerous driving behavior data respectively,and the classification effect and prediction accuracy of confusion matrix are compared and evaluated.The results show that the XGBoost classification model based on group B has the highest prediction accuracy of 88.37 % and the lowest false alarm rate of 11.64 %.The selection of the optimal model shows that the addition of dangerous driving behavior can effectively improve the accuracy and classification ability of the expressway traffic safety risk prediction model.Finally,the SHAP method is used to explain the optimal expressway traffic safety risk prediction model,and the importance and direction of the six single features affecting the traffic safety risk state are analyzed.The influence of the interaction between the two types of features on the model is analyzed,and the influence of specific single and multiple features on the traffic safety risk state is explored in combination with the space-time conditions.The results show that the characteristic variables such as rapid deceleration,rapid acceleration,congestion index and average speed have great influence on traffic safety risk.The time when the risk is relatively large is April and November,Monday and Friday,10:00,14:00,15:00 and19:00 in a day,the section is K15-K21,K318-K323&K0-K6 and K301-K307 in section A.The driver has a sharp acceleration or sharp deceleration driving behavior greater than or equal to 2times/h·km in the empty section at this time.The average speed is 65km/h-80km/h or greater than 110km/h,or the speed coefficient of variation is 0.02-0.08,or the congestion index is1.05-1.5,which is in a dangerous state;in addition,if the average speed of the driver is80km/h-110km/h and the sharp acceleration is greater than or equal to 2 times/h·km,or the average speed is greater than 110km/h and the sharp acceleration or sharp deceleration is greater than or equal to 2 times/h·km,or the speed variation coefficient is greater than 0.02 and the sharp acceleration or sharp deceleration is greater than or equal to 2 times/h·km,or the congestion index is 1.05-1.5 and the sharp acceleration or sharp deceleration is greater than or equal to 2 times/h·km,the traffic is also in a dangerous state. |