Font Size: a A A

Research On Network Traffic Classification Based On Clustering Analysis

Posted on:2010-10-27Degree:MasterType:Thesis
Country:ChinaCandidate:Z K HeFull Text:PDF
GTID:2178360302455702Subject:Computer application technology
Abstract/Summary:PDF Full Text Request
With the rapid development of Internet technology, many application types of Internet (such as FTP, DNS, P2P, etc.) appeared. The traditional based-port and payload-based methods according to application types become inefficient on network traffic classification because of communications using non-standard port and encrypted protocol. This situation motivated many domestic and foreign researchers to study classify network traffic by machine learning methods. Those methods classify network traffic according application types and use the flow statistical characteristics of applications when they communicate on a network. This thesis is also adopting machine learning methods the related technologies to research network traffic classification. The work include network traffic data collection, generating the statistical features, mark the flow example, feature selection, and classifying application type of network traffic.In network traffic classification based on the machine learning methods gather the network flows sample (including training example and test example) is very important. Firstly, to capture network packet, the method of port mapping on the center of the campus network switch are used. And analysis those messages in accordance with the five-tuple(source IP address,source port number,source IP address,source port number,protocol)to flow after they are collected. And then, characteristics (such as the packet size, number, time, sign bit, etc.) of these packets are statistic to generated the feature vector which represents each network flow. Finally, implement auto identifying flow style by port-based, payload-based and protocol methods and form a sample flow.In network traffic feature selection, two feature selection methods called principal component analysis and information gain are introduced to select feature on candidate feature set of two dataset, and have got their optimized feature subset. The experimental results show that the method can reduce the number of characteristics in order to reduce the learning and classify time, and also can remove irrelevant or redundant features, increase the accuracy of classification.Finally, the two clustering algorithms DBSCAN (Density-Based Spatial Clustering of Application with Noise) and K-Means were applied to clustering analysis of the network flow after they have reduced dimension. The Clustering-Based classification rules according to the clustering results are established, and a large number of experiments have done. The experimental results show that these methods which applied to network traffic classification can lead to a higher precision and overall accuracy; and the algorithm of high efficiency and easy implementation, so it is a good method for classification of network traffic. And has a strong research significance and practical value.
Keywords/Search Tags:network classification, feature selection, principal component analysis, information gain, DBSCAN algorithm, K-Means algorithm, clustering analysis
PDF Full Text Request
Related items