| Taking the potential relationship between the severity level of traffic accidents and accident factors as the research problem,the traffic accident records of City from 2011 to 2015 were selected as the research object.Firstly,the experiment removed noise from traffic accident data and filled in the missing data.Secondly,the degree of correlation between each accident factor and the accident level is measured,and the accident factor of the strongest correlation with the accident level is obtained from the measurement results.Finally,the correlation between the combination of accident factors and the accident grade is analyzed,then,the accident factor rules causing the occurrence of different grades of accidents are extracted.The main contents of this paper are as follows:(1)A method to fill the missing values in traffic accident data based on Bayesian ridge regression algorithm is proposed.The algorithm estimates the missing values according to the law of known data.The accuracy of filling in missing values on the test set was79%.Therefore,using this algorithm to fill in the missing values can better reflect the data information and the complete data set was obtained.(2)Based on information gain and chi-square test,a method to measure the correlation degree between each accident factor and the accident grade is proposed.The information gain method and chi-square test method were used to quantify the correlation degree between the two variables of different accident factors and accident grade.Clustering method is used to cluster the results of the two methods respectively.Five accident factors with strong correlation with accident level were obtained from 100 accident factors.By comparing,the two clustering results are consist,which further verifies the correctness of the correlation mining between the selected accident factors and accident levels.(3)A decision-making model based on decision tree algorithm is presented to determine the correlation between accident factor combination and accident level.The obtained data set composed of major accident factors and accident grades is divided into training set and test set for training and evaluation model respectively.According to the obtained decision tree result,the proportions of different values of accident factors represented by each node were counted.The rules of accident factors leading to the occurrence of each grade of accidents are obtained by combining the values which occupy a large proportion.After filtering out the interference factors and model training,four decision tree models with accuracy higher than 79% on the test set were finally obtained.According to the result of the decision tree,the following conclusions are obtained:Blood alcohol content and intersection type were the two main factors leading to the occurrence of high-grade accidents.And 12 accident factor rules that contribute to the occurrence of different level of accidents.According to the experimental results,it can provide some reference suggestions for the traffic management department to reduce the occurrence of all levels of accidents. |