| In recent years,the state has vigorously developed transportation,especially the rapid development of the expressway industry.By the end of 2018,China’s expressway mileage exceeded 140,000 kilometers,and its mileage scale ranked first all over the world.At the same time,with the continuous expansion of the transportation network and the large-scale increase of vehicles,traffic accidents occur frequently,traffic congestion is serious,and the normal traffic is damaged.Timely and accurate traffic incident detection can effectively alleviate traffic congestion caused by traffic incidents,prevent secondary accidents,and increase expressway traffic safetyAutomatic Incident Detection(AID)is a typical two-classification problem that can classify event states into two classes:non-incident state and incident state.In real life,the amount of incident state data is generally much less than the amount of non-incident state data.Therefore,the problem of traffic incident detection is an imbalanced two-classification problem.In this paper,several traffic incident detection models are proposed,with classifiers based on basic machine learning methods,combined with the preprocessing algorithms of imbalanced datasets.Firstly,I respectively adopt Bayes Network,Logistic Regression,SVM,Decision Tree,Neural Network,AdaBoost,Gradient Boosting Decision Tree and Random Forest as classifiers and optimize the parameters of classifiers with grid search.By comparing the performance of different classifiers.Random Forest with better overall performance is selected as the classifier of traffic incident detection algorithm.Secondly,an AID algorithm with Random Forest and SMOTE(Synthetic Minority Over-sampl:ing Technique)is proposed to process imbalanced traffic datasets.This algorithm first uses minority samples to synthesize artificial samples in order to achieve a balanced data distribution,and then,the Random Forest classifier is used to obtain corresponding performance parameter values,and compared with the former models.Finally,considering the existence of duplicates in the samples generated by SMOTE,it is easy to cause over-fitting,so data cleaning techniques,Tomek links and ENN(Edited Nearest Neighbor)are introduced to eliminate duplicate samples.This algorithm first uses SMOTE to obtain balanced datasets,and then uses Tomek links or ENN to eliminate duplicate samples in the balanced datasets,and then through the Random Forest classifier.After comparison and analysis,the optimal AID algorithm is the Random Forest based on SMOTE and Tomek links.The experimental datasets of this paper are derived from real traffic datasets of Wuxi detection points of Beijing-Shanghai Expressway.PyCharm is the programming platform experiment.The experimental results indicate that Random Forest traffic incident detection algorithm based on SMOTE and Tomek links can optimize AID results of imbalanced datasets,improve detection efficiency and obtain better comprehensive performance. |