| A typical feature of highways in Shanxi Province is the high bridge-tunnel ratio,and tunnels are the bottleneck sections in highways.Prediction of traffic accidents in the province’s highway tunnels can help the province’s highway tunnel network to conduct a comprehensive analysis and predict the changing trend of traffic safety,thus providing a basis for proposing specific road safety design,traffic engineering facilities and traffic safety management tools,formulating road traffic policies,and analyzing and evaluating regional traffic safety situation issues.This paper relies on the project "Safety Technology Research on Highway Tunnel Operation and Management in Shanxi Province",and the main purpose of the research is to analyze the influencing factors of highway tunnel traffic accidents in the province and to predict the number of tunnel traffic accidents in the province.First,based on the results of literature research summary and data collection and collation,the influencing factors and distribution characteristics of accidents were analyzed.The influencing factors can be summarized into four categories: people,vehicles,tunnels and environment.The distribution characteristics of accidents include three aspects: temporal distribution,spatial distribution and accident pattern distribution.Then the data were processed for the missing values and abnormal values.For the existence of missing values,four imputer methods are initially proposed: mean imputer,zero imputer,KNN regression imputer and random forest imputer.A test set containing missing values is constructed based on the complete dataset,and after the test set imputer performance test,the KNN-2 algorithm with the best imputer performance is selected to handle the missing values.As for the existing outliers,based on the review of outlier processing methods,the IForest method is used to detect the outliers,and 18 sample data are selected for the algorithm example,and the demonstration of binary tree construction,path length calculation and outlier index calculation is performed.Finally,based on the more concentrated distribution of the detected data abnormality index,the normal data and abnormal data were screened according to the ratio of 8:2.After the above two processing steps,3347 complete and normal accident data were obtained.After that,based on the strong explanatory feature of the traditional regression prediction method,the traditional binary logit regression model was used to analyze the influencing factors of the key accidents.Three types of accidents,namely,"accidents with casualties","accidents with long duration" and "serial rear-end accidents",were identified as the key accidents,and the correlation of the relevant independent variables was analyzed and eliminated.The correlations of the independent variables were analyzed and the factors with strong covariance were eliminated;the parameters were estimated by replacing the independent variables into a binary logit regression model with a significance level of 0.2.Finally,a combined BPNN-AdaBoost model was used to predict the accidents.Firstly,the prediction target was formulated as the number of accidents in different tunnels in different years;then the data of pavement type,large vehicle ratio,traffic volume,and the number of accidents in previous years were determined as the input variables of the prediction model based on the data processed by missing values and outliers.After that,the prediction results were obtained by One Hot Encoder,PCA downscaling and BPNN-AdaBoost model construction,training and optimization process.The comparison based on three metrics,MAE,RMSE,and accuracy within the error range of two,shows that the BPNN-AdaBoost model can be well applied to traffic accident prediction. |