Font Size: a A A

Deep Learning For Network Traffic Classification And Anomaly Detection

Posted on:2019-07-08Degree:DoctorType:Dissertation
Country:ChinaCandidate:W WangFull Text:PDF
GTID:1318330542994139Subject:Control Science and Engineering
Abstract/Summary:PDF Full Text Request
With the continuous growth of the network size and the continued development of various types of network applications,the Internet has become an indispensable infrastructure for the production and life of human beings.At the same time,all kinds of network attacks are becoming more and more rampant,which cause great risks to Cyberspace Security.For example,denial of service attacks,computer worms,blackmail virus and so on.As an effective means of network protection,network traffic anomaly detection can detect unknown attack behavior and can provide important support for network situational awareness.In recent years,it has been paid more and more attention by researchers.So far,many different kinds of network anomaly detection methods have been proposed by researchers.Among them,the method based on network traffic classification is an important one.However,most network traffic classification methods are based on the traditional machine learning method,and the classification performance is heavily dependent on the design of traffic features.How to design a set of feature sets that can accurately describe traffic characteristics requires a lot of human experience and feature engineering techniques,and is still an unsolved problem.In recent years,deep learning has achieved excellent results in the fields of computer vision,speech recognition and natural language processing.It has also brought new opportunities for the development of network traffic classification and anomaly detection.Based on the method of deep learning,this dissertation studies the above problems,and the main research work and innovations are as follows.1.Malware traffic classification based on representation learning.Aiming at the feature dependence problem of malware traffic classification method based on traditional machine learning,a method of malware traffic classification based on representation learning is proposed.In contrast to the traditional machine learning method based on feature engineering,this method does not need to manually extract and select the feature set of network traffic,but instead uses the original network traffic as the input data of the deep neural network.The overall process of traffic data representation learning is done by the deep neural network,which can save a lot of characteristic engineering workload and reduce the complexity of the task.Through a number of experiments,the optimal network traffic representation is network data with bidirectional communication with all protocol layers.Experiments are carried out under two application scenarios with three classes of classifiers.The experimental results show that the proposed method achieves practical application in many aspects such as accuracy,accuracy,recall rate and F1 value.2.End-to-end encrypted traffic classification based on 1D-CNN.Aiming at the problem that the traffic classification method based on divide and conquer strategy is difficult to obtain the global optimum,an end-to-end traffic classification method based on one-dimensional Convolutional Neural Network(1D-CNN)is proposed.This method integrates feature extraction,feature selection,classifier and so on into an end-to-end framework,which can automatic learn the nonlinear relationship from the original input to the expected output,more likely to get the global optimal value.In this dissertation,1D-CNN is used as an end-to-end framework,which is more suitable for one-dimensional sequence of network traffic than the commonly used two-dimensional convolutional neural network.The experimental results show thatthis method has achieved excellent performance in a public encrypted traffic data set.Of the 12 experimental results in the four experimental scenarios,the 11 results of this method were superior to the state-of-the-art method.Especially in the classification of VPN encrypted traffic,the proposed method improves the accuracy and recall by about 10%.3.Network traffic classification based on two-stage LSTM.Aiming at the problem that the current network traffic classification method using depth learning does not make full use of the structured information of network traffic,a method of network traffic classification is proposed,which uses two-stage Long Short-Term Memory networks(LSTM)at two levels of network packet and network flow.The method uses bidirectional LSTM to learn the features of network packet and network flow respectively,and then gets the more comprehensive traffic features,so as to achieve more accurate network traffic classification results.The method takes full account of the internal structure and organizational relationship of network traffic,and effectively utilizes the excellent temporal feature learning ability of LSTM.Experimental results show that this method achieves good results in a public traffic data,and most of the performance indexes are above or equal to the state-of-the-art results.4.Network traffic anomaly detection based on hierarchical spatial-temporal feature learning.Aiming at two common problems such as feature dependency and high False Alarm Rate(FAR)in the field of network anomaly detection,a method named HAST-NAD for network traffic anomaly detection based on hierarchical spatial and temporal feature learning is proposed.The method uses CNN to learn the underlying spatial features of network traffic,and uses bidirectional LSTM to further learn the upper level temporal characteristics of network traffic.The learning process is completed automatically by a deep neural network,without any feature engineering techniques,effectively avoiding the inaccuracy of manual design of traffic features.At the same time,automatically learned traffic features also effectively reduce the false alarm rate.Experimental results on two classic open test datasets,DARPA1998 and ISCX2012,show that HAST-NAD has achieved a high false alarm rate while achieving high accuracy and detection rate,which has better comprehensive detection effect than other open anomaly detection methods.Especially for DARPA1998 data sets,the proposed method improves the Effectiveness Measure by about 24%compared to the MARK-ELM method based on feature engineering technology.The above methods have been partially applied to the Prior Strategy project"Real-time Processing System of Massive Network Traffic Based on Sea-cloud Collaboration"(GrantNo.XDA06011203)and "The Research of the Future Network Architecture and Edge Equipment Development"(GrantNo.XDA060110302)of the Chinese Academy of Science.
Keywords/Search Tags:cyberspace security, situation awareness, traffic classification, anomaly detection, representation learning, end-to-end learning, deep neural networks, deep learning
PDF Full Text Request
Related items