Font Size: a A A

Research On Network Intrusion Detection Based On Hybrid Sampling And Deep Integration Algorithm

Posted on:2022-04-15Degree:MasterType:Thesis
Country:ChinaCandidate:F Y YanFull Text:PDF
GTID:2518306539481464Subject:Software engineering
Abstract/Summary:PDF Full Text Request
In recent years,the development of the Internet has penetrated into every field.With the rapid development of computer technology and networks,network security problems are also facing severe tests,and various forms of network attacks have emerged one after another.How to detect abnormal network attacks quickly and efficiently has become an important issue in the field of network security.The existing research on network intrusion detection related technologies has the characteristics of imbalance between positive and negative samples of research data,poor prediction ability of low-proportion samples,and complicated technical operations that are difficult to promote.Based on this,this paper uses a hybrid sampling method based on Easy Ensemble downsampling and Table Gan upsampling to balance the data set,and proposes a network intrusion detection algorithm based on StackingGCForest algorithm.The specific research ideas are as follows:First: This study first collects three data sets,KDD?99,NSL?KDD,and UNSWNB15,and performs data preprocessing through machine learning preprocessing.The main processing includes: data sample deduplication,character variable conversion,and data standardization.Reduce data interference items through preprocessing and restore the authenticity of data.Second: By observing the data distribution in the three data sets,the Easy Ensemble-Table Gan mixed sampling method is used to balance the samples for the unbalanced sample sets.The main idea is to formulate an appropriate proportion of data,and In the data set,the samples with high proportions are sampled by Easy Ensemble for down-sampling,and then the samples with low proportions are used for up-sampling by the Table Gan algorithm.Finally,the samples that have been sampled up and down are fused and randomly scrambled.The data set balance processing is carried out through the up and down mixed sampling method,so that the data distribution is more balanced.Third: Based on the data set that has been processed above,first use five highly heterogeneous machine learning algorithms: Logistic Regression(LR),Decision Tree(DT),Multilayer Perceptron(MLP),Random Forest(RF)And Light GBM constructs the network intrusion detection model.At the same time,the Stacking integration idea is introduced,the above five basic models are used as the Stacking base model,and Light GBM is selected as the Stacking meta-model to construct the network intrusion detection model,and the differences between the Stacking integration model and the five basic models in the test set are analyzed.Secondly,the deep forest GCForest is used to construct the network intrusion detection algorithm model,and a different multi-granularity scanning sliding window is formulated to optimize the optimal model and output the optimal model.Finally,based on the advantages of Stacking and GCForest algorithms,the two algorithms are merged,hyperparameter optimization is performed through grid search,and comprehensive evaluation indicators such as Gmean are used to evaluate the model,and finally the prediction model with the best prediction performance is output.Compared with conventional machine learning algorithms,the research method in this article not only takes into account the imbalance of the sample,but also combines the uncertainty of the data.This paper uses the network public data sets KDD?99,NSL?KDD,UNSW-NB15 to verify the feasibility of the proposed algorithm.Through experiments,it is found that the prediction recall rate of low percentage samples has been greatly improved,and the overall prediction performance of the model Gmean has also been improved.The great improvement proves the effectiveness and feasibility of the algorithm proposed in this paper,and has certain guiding and popularization value for practical applications.
Keywords/Search Tags:network intrusion, sample imbalance, easyensemble, tablegan, stacking integration, deep forest
PDF Full Text Request
Related items