Font Size: a A A

Research On Network Traffic Classification And Anomaly Detection Based On Deep Learning

Posted on:2022-12-17Degree:DoctorType:Dissertation
Country:ChinaCandidate:L ZhangFull Text:PDF
GTID:1488306743973869Subject:Computer Science and Technology
Abstract/Summary:PDF Full Text Request
With the rapid development and widespread of the Internet,the types and quantities of anomaly traffic are also increasing day by day.Anomaly traffic detection,as an important part of computer system and network security,has become a research hotspot in the information age.Traffic classification based on machine learning is the most widely studied category of anomaly traffic detection methods,but how to design and extract relevant feature sets that can accurately describe traffic characteristics is the main problem facing current research.Compared with traditional machine learning,deep learning does not require manual design and extraction of feature sets,which brings new opportunities for the development of traffic classification and anomaly detection.Therefore,this paper studies traffic classification and anomaly detection based on deep learning.The main research work and innovation of this paper are as follows:(1)Construct a new network traffic data set.In the traffic classification and anomaly detection methods based on machine learning,traffic dataset is the training basis of classification model,and the quality of datasets plays a key role in verifying whether the classification method is effective.It is found that data redundancy,obsolete data type and unbalanced data distribution are common problems in current public data sets.In order to solve these problems,the public datasets are optimized and new dataset is constructed.Firstly,this paper puts forward a Oversampling method of unbalanced data based on L-smote(Synthetic Minority Oversampling Technique).Oversampling the flow types in the dataset which account for a little,so as to achieve data balance.Solve the problem of poor classification of a few classes in the classification process.Then from traffic data collection,data marking and data balancing,The large-scale Traffic Classification Dataset TJUTC(Tianjin University of Technology Dataset for Traffic Classification)and abnormal Traffic Dataset TJUTD(Tianjin)are systematically constructed University of Technology DDo S Dataset).Compared with existing datasets,TJUTC and TJUTD datasets have great advantages in data volume,traffic types and extensibility,and some datasets are published for use by researchers.In the method proposed in this paper,common datasets and constructed datasets are used as the basis of model training and classification standards.(2)Aiming at the complex problem of feature design and extraction based on traditional machine learning traffic classification methods,a traffic classification method based on Netflow and DNN(Deep Neural Network)is proposed.This method used Netflow records as the basis of traffic classification,mining deep combination features of Netflow data through deep neural network,self-constructing network traffic feature set,and realizing automatic design and extraction of feature set.Because Netflow record format is uniform and easy to obtain,this method saves a lot of feature design and extraction work and reduces the complexity of work.The influence of DNN structure on classification effect was discussed through several experiments.At the same time,experimental verification was carried out on the comparison of three classifiers and two data sets.Experimental results show that compared with other machine learning methods,this method can significantly improve recall rate,precision rate and 1value in network application classification,especially in traffic classification of campus network.(3)In order to solve the problem of missing traffic features in the problem of encrypted traffic classification,both CNN(Convolutional Neural Networks)and SAEN(Stacked Autoencoder Networks)are proposed.The method converts the original traffic data into a traffic graph of the same size,and uses CNN to extract high-level spatial features of the traffic graph for classification.However,some traffic information will be missing during the traffic graph transformation,which affects the classification effect.Therefore,SAEN is used to reduce dimension of convection statistics,extract dimension reduction features,and classify the two features as inputs.This method solves the problem of feature set construction in the classification of encrypted traffic,and uses the feature construction method that combines the spatial features and statistical features of traffic to complete the multi-dimensional combination of traffic features and improve the feature set.Experiments on several data sets show that the average recall rate of the classification method is more than 97%,and the average recall rate of the classification method is more than 98% in the actual campus network traffic,which meets the needs of practical application.(4)In order to improve the efficiency of anomaly detection,work is carried out from two aspects: one is to reduce the detection time by combining other anomaly traffic detection methods;The second is to optimize the deep learning model and reduce the training and classification time of the model.Based on this,A new Hybrid Method of Entropy and SSAE-SVM(HESS)is proposed to detect anomaly traffic.HESS is an anomaly traffic detection mechanism that includes data collection,anomaly traffic detection,and attack defense.The anomaly traffic detection methods used by HESS include the initial detection method based on information entropy and the depth detection method based on SSAE-SVM(Sparse Auto Encoder-Support Vector Machine).In the initial detection method based on information entropy,the initial detection of attack traffic is completed by calculating the information entropy value of traffic characteristics in packet units in a small time scale and defining the confidence interval.Although this method has a high false positive rate,as a preliminary detection method,it can effectively improve the detection speed of abnormal flow.In the depth detection method based on SSAE-SVM,SSAE and SVM are innovatively combined.SSAE can effectively reduce the dimension of traffic features,and SVM can classify traffic according to the dimension reduction features.Experimental results show that HESS can effectively reduce computational complexity and detection time while maintaining high recognition rate,high accuracy rate and low false positive rate of abnormal traffic detection.Meanwhile,HESS can effectively defend against attacks.
Keywords/Search Tags:cyber security, Internet traffic classification, anomaly detection, deep learning, information entropy
PDF Full Text Request
Related items