| How to accurately detect the potentially harmful intrusion behaviors from the network data and develop the intrusion detection system with higher detection rate is a research focus in the field of network security.However,there are many classes of intrusion behaviors in the network intrusion dataset,and the number of samples in different classes is far from each other.At the same time,there are many problems in network intrusion dataset,such as high dimension and redundant data.In this paper,how to improve the performance of intrusion detection system is studied as follows.Firstly,in view of the large number of outliers and unbalanced distribution of samples in the network intrusion dataset,an optimized data sampling method i Forest_DS based on isolated forest is proposed.The outliers that affect the performance of the model in the data are detected and deleted by Isolated Forest algorithm.In order to get the optimal sampling ratio,the method of combining genetic algorithm and Random Forest algorithm is used to optimize the sampling ratio.Through the above steps,an effective data sampling model can be constructed.Secondly,aiming at solving the problems of high dimension and redundant features,A feature selection method MI_GARF_FS is proposed,which combines filtering method based on mutual information and wrapper method.This method uses mutual information to evaluate and rank features,and selects the first several features as feature subsets.On this basis,the optimal feature subsets of each network behavior are selected by using the wrapper method based on genetic algorithm and Random Forest algorithm.Thirdly,due to the different characteristics of different network intrusion behaviors,in order to detect each kind of network intrusion more efficiently,an ensemble intrusion detection model RFEn_IDS based on random forest algorithm is proposed.The model is integrated by the base model based on voting strategy.Each base model uses the classifier constructed by Random Forest algorithm to classify and detect the intrusion behavior based on the optimization of data sampling and feature selection algorithm proposed in this paperFinally,the data sampling method i Forest_DS,the feature selection method MI_GARF_FS and the final intrusion detection model RFEn_IDS are carried out on the reference data set UNSW-NB15 and NSL-KDD respectively,and the results are compared with the intrusion detection model composed of other classical machine learning algorithms. |