Font Size: a A A

Research On Network Traffic Classification Based On Behavior

Posted on:2014-12-22Degree:DoctorType:Dissertation
Country:ChinaCandidate:J H YanFull Text:PDF
GTID:1228330467463705Subject:Computer Science and Technology
Abstract/Summary:PDF Full Text Request
Nowadays, with the rapid development of the Internet, the network bandwidth and network application have increased significantly. Therefore the network traffic has presented the characteristics of diversified and complicated. Although network provides great convenience to people’s daily lives, it also raises the difficulty of network management and leads to huge network security risks. Accurate and online network traffic classification is crucial for network management, network security monitoring, and lawful interception of network data.Traditional traffic classification technologies include port number based method and deep packet inspection. Since more and more network applications using dynamic port numbers and encryption techniques to avoide detection, the port-based and deep packet inspection (DPI) methods turn out to be inefficient to deal with the increasingly complex network. In order to classify network traffic more effectively, the researchers have proposed machine learning methods and behavior characteristics based methods. These technologies take use of the behavior of traffic flow and behavior of host respectively, which can be regarded as traffic classification technologies based on behavior. Both of these methods can overcome the limitations of port-baswd and DPI methods, thus they have attracted a lot of attentions from both academic community and industrial community.This dissertation studies network traffic classification based on behavior deeply, and aims to achieve more accurate, fast and robust network traffic classification. This paper solves the issue of traffic identification based on network behavior of host and flow, out-of-sequence traffic classification, real-time traffic classification based on co-training, traffic classification based on ensemble learning. The main contributions of this thesis are as follows:(1) Network traffic identification based on host level and flow level behavior characteristics:This method includes two stages, which is host level stage and flow level stage. We focus on P2P traffic identification. First we determine whether a host takes part in P2P application by matching its behavior with some predefined host level behavior profiles. Subsequently, we refine the identification by comparing the statistical features of each flow in the host with several flow feature profiles. The host level behavior profiles include several characters of typical P2P host, such as IP population ratio, ratio of forward and backward bytes, port ratio of a <IP, Port>pair and fail connection ratio. The flow level behavior profiles are consists of some flow properties of P2P flows, e.g. large flow duration and byte number, average byte of signaling flow, typical characteristics of bittorrent, skype and edonkey applications. The experiments results prove that this approach can obtain classification accuracy of93.1%and95.1%in terms of flow and byte respectively, leaving as little as2.3%of flows and1.9%of byte unclassified.(2) Out-of-sequence traffic classification based on improved dynamic time warping (IDTW):Plenty of machine learning methods have been proposed for network traffic classification and have shown good results. However, when applied to traffic with out-of-sequence packets, the accuracy of existing machine learning approaches decrease dramatically. We summarize different out-of-sequence situations and observe the main reason is that the out-of-sequence packets change the spatial representation of feature vectors, which means the property of linear mapping relation among features used in machine learning approaches cannot hold any more. This dissertation proposes an improved dynamic time warping (IDTW) method, which can tackle all the out-of-sequence situations by relaxing boundary and monotonicity constraints of classical dynamic time warping. The experimental results show that the classification accuracy of IDTW is24%to41%higher than that of other existing machine learning approaches.(3) Online traffic classification based on co-training:This dissertation investigates co-training method for online traffic classification. This method is a kind of semi-supervised methods which can achieve high accuracy with little labeled training samples. The co-training algorithm needs two separate features which are sufficient to train a good classifier. We choose packet size and inter-packet time of the first packets of a traffic flow as two features. However, the inter-packet time is dependent to network conditions and will be impacted by network jitter. This paper constructs a robust inter-packet time feature named "Netipt" which is the average value of inter-packet time of flows in the same subnet. By taking advantage of correlation of traffic flows sharing the same source and destination subnet address, Netipt can be more resilient to network jitter. The experimental results show that by using Netipt, the classification accuracy is2.9%to8%higher than other interval-packet time feature, and the co-training algorithm can enhance the accuracy of traffic classification drastically even when there are very few labeled training samples.(4) Traffic classification based on weighted confidence ensemble technique: Each classification technique has disadvantages, thus none of them can achieve the highest accuracy for all traffic classification tasks. Ensemble learning technique combines each individual classifier to achieve higher accuracy. In this dissertation, we propose a weighted confidence ensemble method for traffic classification. The weighted confidence ensemble method first calculate the confidence values inferred by each individual classifier, then assign weight for each classifier according to its prediction accuracy on a validation traffic dataset, at last weighted average confidence for classification result. The experimental results demonstrate the new weighted confidence combination classifier outperforms the individual classifier. By combining with IDTW algorithm and co-training algorithm, the weighted confidence ensemble framework can obtain higher classification accuracy on traffic dataset which contains only little labeled samples as well as out-of-sequence flows.
Keywords/Search Tags:Traffic classification, Traffic behavior, Out-of-sequence, Dynamic time warping, Co-training, Ensemble learning
PDF Full Text Request
Related items