| With the rapid development of the Internet,the intelligent information age provides more possibilities for malicious to data leakage,network attacks and other behaviors,network security is facing severe challenges.Intrusion detection is a new information security technology used to detect the intrusion behavior in the computer network systems.Compared with traditional rule-based intrusion detection methods,the intrusion detection methods based on machine learning can better identify unknown anomalies and deal with large and complex log data in the era of big data.This thesis takes NSL-KDD intrusion detection public data set as the research object.Decision Tree(DT),Random Forests(RF),Gradient Boosting Decision Tree(GBDT),e Xtreme Gradient Boosting Decision Tree(XGBoost)and Support Vector Machine(SVM)are used to detect network intrusion,and a new feature selection algorithm based on “tree model + Boruta” model is designed to improve the detection performance of the five machine learning models,and it provides important reference and guidance for actual network intrusion detection project.In this thesis,the NSL-KDD data set is preprocessed,and XGBoost is used to realize feature selection.The optimal parameter combination of decision tree,random forest,GBDT,XGBoost and SVM is found through grid searching.The intrusion detection experiments are carried out with the five machine learning models after parameter modulation,and the detection performance is compared and analyzed.Then,a new feature selection algorithm based on “tree model + Boruta” model is proposed,which includes “tree model + Boostaroota” and “tree model + Catboruta”.The “tree model + Boostaroota” algorithm includes four schemes: separate Boostaroota and the combination of RF,GBDT,XGBoost,and Boostaroota.Catboruta is an improvement on Boostaroota,which uses a Catboost algorithm beyond XGBoost to calculate feature importance and changes the "critical value" to the parameter p multiplied by the square of the average feature importance value of the shadow feature.The “tree model + Catboruta” algorithm also includes four schemes: separate Catboruta and the combination of RF,GBDT,XGBoost and Catboruta.Finally,the improved performance of the new feature selection algorithm on the performance of machine learning model is compared and analyzed.The experimental results show that the three tree models combined with Boostaroota and Catboruta can improve the detection performance of machine learning model more than Boostaroota and Catboruta feature selection alone,moreover,“tree model + Catboruta” has better performance improvement than “tree model + Boostaroota” feature selection scheme,it is proved that the new feature selection algorithm has a certain application value in the field of network intrusion detection based on machine learning. |