Font Size: a A A

The Enhancement Method For Network Traffic Data In Anomaly Detection

Posted on:2022-02-18Degree:MasterType:Thesis
Country:ChinaCandidate:Q ZhaoFull Text:PDF
GTID:2518306341953629Subject:Computer Science and Technology
Abstract/Summary:PDF Full Text Request
With the continuous development of network technology and the continuous expansion of network scale,the Internet has covered all aspects of society and brought unprecedented changes to work and life.However,network security issues also follow,causing huge hidden dangers to normal network operations.Therefore,it becomes more and more important to discover network abnormalities in time through network protection methods.Among the common network protection methods,network traffic anomaly detection is a method to discover network abnormal behaviors by learning network traffic data.Among them,the classification-based anomaly detection method combines various technologies in the popular data mining field,which has attracted the attention of researchers and has been widely used.In anomaly detection of network traffic,classification-based anomaly detection methods mainly learn network traffic data features to build network traffic classification models,relying on network traffic data sets to provide rich information,so traffic data is an important support for network traffic anomaly detection.For example,when identifying normal and abnormal network traffic data,there is often a large gap between normal and abnormal traffic data.This kind of data imbalance can easily lead to the subsequent construction of anomaly detection model overfitting a large number of normal traffic data,A small amount of abnormal traffic data cannot be effectively identified,which affects the effect of anomaly detection.Traffic data is an important basis for network traffic anomaly detection,but traffic datasets are often unbalanced,so this paper enhances the information of the original dataset through data enhancement to solve the problem of unbalanced datasets in anomaly detection and ultimately help improve the effectiveness of traffic anomaly detection.At present,commonly used data enhancement methods enhance the effect of enhancing the information of minority samples by strengthening boundary sample recognition,optimizing the selection of minority samples,clustering preprocessing the minority,and noise cleaning.However,most of the existing methods are difficult to solve the problems such as separation of data and uneven data distribution,which often exist in data sets.At the same time,for multi-class data sets,there are problems such as different important features of different classes and difficulty in identifying noisy samples.Therefore,in view of the current imbalance of network traffic data,this paper proposes network traffic data enhancement methods for anomaly detection,and proposes corresponding solutions for the binary-class and multi-class scenarios.The research work in this paper is specified as follows.(1)binary-class data enhancement method for traffic anomaly detection:In identifying two kinds of network behaviors that occur with large differences in frequency,we propose a data enhancement method based on first-nearest-neighbor clustering and multilayer perceptron for the problems of data separation and uneven data distribution in the binary-class data set of network traffic.Firstly,minority class clusters are filtered by first nearest neighbor clustering,then the number of samples synthesized in each cluster is adaptively assigned according to the distribution of samples in the cluster,then the initial weights are assigned to the samples in the cluster,and finally the noisy samples are cleaned using a multilayer perceptron during data synthesis.The experimental results show that the method proposed in this paper can effectively enhance the information of minority samples in the unbalanced binary classification network traffic dataset,and finally improve the network traffic anomaly detection effect.(2)Multi-class data enhancement method for traffic anomaly detection:In identifying multiple network behaviors with large differences in frequency of occurrence,the data enhancement algorithm based on dimensionality reduction synthesis and XGBoost(eXtreme Gradient Boosting)is proposed to address the problems of different important features of different classes in multi-class data sets of network traffic and the difficulty of identifying noisy samples.First,the synthetic samples are selected based on the information entropy of a few classes of samples.Then,the data are decomposed by principal component analysis to make the features uncorrelated with each other,and then the data are synthesized.Finally,noise cleaning is performed on the synthesized new samples using XGBoost-based voting decision mechanism.The experimental results show that the method proposed in this paper can effectively enhance the sample information of multiple minority classes in the unbalanced multiclassification network traffic dataset,and finally improve the network traffic anomaly detection effect.
Keywords/Search Tags:Data enhancement, Network anomaly detection, Oversampling, Imbalance learning, Noise cleaning
PDF Full Text Request
Related items