| There are many problems in the network intrusion data,such as data redundancy,imbalance and high dimension of features.The research of network intrusion detection technology based on data processing and feature selection is particularly important.Aiming at the above problems,this paper designs the algorithm model from two aspects of data processing and feature selection.The specific work is as follows.Firstly,this paper introduces the theory and function of data processing.This paper analyzes the distribution and characteristics of the data and features in the current network intrusion detection data set,and determines that this study starts from data processing and feature selection,and improves the performance of network intrusion detection by simplifying the data set.Secondly,aiming at the problems of noise and redundancy in network intrusion data set,a data sampling strategy based on mini batch k-means(MBKM)is developed.To deal with the imbalance of network intrusion data set,the data is divided into a large number of data and a small number of data.A data processing model based on mbkm and genetic algorithm(GA)is proposed for a large number of data.GA is used to optimize the clustering distance parameters and obtain the optimal reduced data set.Based on the reduction of data set,a network intrusion detection algorithm based on MBKM-GA data processing is proposed.Thirdly,through the analysis of the experimental results of data processing model,it is found that there are still high-dimensional curse problems in the simplified data set.In order to avoid the impact of curse on the performance of network intrusion detection algorithm,a feature selection strategy based on recursive feature elimination(RFE)is proposed.Through the analysis and comparison of lasso regression and ridge regression,Lasso regression is selected as the basic model,and a feature selection model based on lasso RFE is proposed.The model can deal with the curse problem of high dimension of features in network intrusion data set,and get the order of feature importance.Combined with the results of data processing model and feature importance ranking,a network intrusion detection algorithm based on LASSO-RFE feature selection is proposed.Finally,the effectiveness of the proposed network intrusion detection algorithm is analyzed by using UNSW-NB15 data set,and compared with a variety of algorithms,it is proved that the proposed algorithm has good performance in network intrusion detection. |