Font Size: a A A

Research On Generation And Classification Methods Of Unbalanced Samples In Abnormal Traffic Detection

Posted on:2022-06-23Degree:MasterType:Thesis
Country:ChinaCandidate:R J ZhangFull Text:PDF
GTID:2518306557967889Subject:Computer technology
Abstract/Summary:PDF Full Text Request
With the popularity of the Internet,the data traffic in the network presents an explosive growth.With the rapid development of the Internet,the network security situation is becoming more and more serious.There are a lot of abnormal traffic in the network,which need to be found and handled in time.With the rapid development of machine learning technology,this technology is more and more used in the field of abnormal traffic detection.However,the abnormal traffic usually accounts for only a small part of the total network data traffic.When training the machine learning model,there is the problem of imbalance between the positive and negative samples in the training set,which affects the training effect of the model.At the same time,the detection effect of single machine learning model in the face of complex and changeable abnormal traffic data is often not ideal.Aiming at the problem of abnormal traffic detection of unbalanced samples,this paper proposes an improved optimization method at the data level and algorithm level.At the data level,this paper proposes an unbalanced sample generation method based on VAE.The core idea is that when expanding the minority samples,not all of them are expanded,but the minority samples are analyzed,and the boundary samples which are most likely to produce confusion effect on machine learning are expanded.Firstly,KNN algorithm is used to filter out the samples which are closest to the majority samples.Secondly,DBSCAN algorithm is used to cluster some samples selected by KNN algorithm to generate one or more sub clusters.Finally,a VAE network model is designed to learn and expand the minority samples in one or more sub clusters distinguished by DBSCAN algorithm The new samples are added to the original samples to build a new training set.At the algorithm level,we first design and implement a comprehensive evaluation algorithm of weak classifiers to ensure that the model can use as few weak classifiers as possible to achieve a better prediction effect.At the same time,we use four different models to form a heterogeneous ensemble to ensure the overall difference of the integrated learning system from the algorithm level to the greatest extent.Then we design and implement the annotation of training samples of weak classifier selector The main purpose of the algorithm is to deal with the training samples of the weak classifier selector;finally,an adaptive ensemble learning algorithm for abnormal traffic detection is designed and implemented.Compared with the traditional ensemble learning algorithm,the algorithm adds a weak classifier selector module.In the model training stage,the annotation data generated by the weak classifier selector training sample annotation algorithm is used to select the model In the test phase,the test samples are first input into the weak classifier selector to determine which weak classifiers participate in the integration.The weak classifiers participating in the integration are selected to judge the abnormal traffic test samples,and the voting method is used to integrate the judgment results of each weak classifier to generate the final prediction results.Finally,a comparative experiment was designed for verification,and the recall rate and F1 score were used as evaluation indicators.For the test of unbalanced sample generation,the original sample,the sample generated by SMOTE and its improved method and the sample generated by the method in the text were used as the training set training model for comparison experiments.The test of integrated learning compares the detection effect of the traditional anomalous traffic detection algorithm and the adaptive integrated learning anomaly traffic detection algorithm proposed in this paper.The experimental results show that the optimization of this paper at the data and algorithm level effectively improves the detection effect under the condition of sample imbalance.Although the detection process takes more time,the recall rate and F1 score are significantly improved compared with traditional methods.
Keywords/Search Tags:abnormal flow, variational auto-encoder, imbalanced sample, KNN, DBSCAN, ensemble learning
PDF Full Text Request
Related items