Research On Network Traffic Classification Based On Clustering Analysis

Posted on:2010-10-27

Degree:Master

Type:Thesis

Country:China

Candidate:Z K He

Full Text:PDF

GTID:2178360302455702

Subject:Computer application technology

Abstract/Summary:

PDF Full Text Request

With the rapid development of Internet technology, many application types of Internet (such as FTP, DNS, P2P, etc.) appeared. The traditional based-port and payload-based methods according to application types become inefficient on network traffic classification because of communications using non-standard port and encrypted protocol. This situation motivated many domestic and foreign researchers to study classify network traffic by machine learning methods. Those methods classify network traffic according application types and use the flow statistical characteristics of applications when they communicate on a network. This thesis is also adopting machine learning methods the related technologies to research network traffic classification. The work include network traffic data collection, generating the statistical features, mark the flow example, feature selection, and classifying application type of network traffic.In network traffic classification based on the machine learning methods gather the network flows sample (including training example and test example) is very important. Firstly, to capture network packet, the method of port mapping on the center of the campus network switch are used. And analysis those messages in accordance with the five-tuple(source IP address,source port number,source IP address,source port number,protocol)to flow after they are collected. And then, characteristics (such as the packet size, number, time, sign bit, etc.) of these packets are statistic to generated the feature vector which represents each network flow. Finally, implement auto identifying flow style by port-based, payload-based and protocol methods and form a sample flow.In network traffic feature selection, two feature selection methods called principal component analysis and information gain are introduced to select feature on candidate feature set of two dataset, and have got their optimized feature subset. The experimental results show that the method can reduce the number of characteristics in order to reduce the learning and classify time, and also can remove irrelevant or redundant features, increase the accuracy of classification.Finally, the two clustering algorithms DBSCAN (Density-Based Spatial Clustering of Application with Noise) and K-Means were applied to clustering analysis of the network flow after they have reduced dimension. The Clustering-Based classification rules according to the clustering results are established, and a large number of experiments have done. The experimental results show that these methods which applied to network traffic classification can lead to a higher precision and overall accuracy; and the algorithm of high efficiency and easy implementation, so it is a good method for classification of network traffic. And has a strong research significance and practical value.

Keywords/Search Tags:

network classification, feature selection, principal component analysis, information gain, DBSCAN algorithm, K-Means algorithm, clustering analysis

PDF Full Text Request

Related items

1	Research On Quantum Feature Selection And Principal Component Analysis Algorithms
2	Research On Feature Extraction, Selection And Classification Algorithms For Pulmonary CAD
3	Study On Improvement Of K-means Clustering Algorithm
4	Research On Feature Extraction Based On Principal Component Analysis
5	Study Of Feature Extraction And Classification Algorithm For HRRP
6	Study On Weld Defect Model And Classification Algorithm Of X-ray Submerged Arc Welding
7	Research On Face Recognition Algorithm Based On Principal Component Analysis
8	Research On Feature Selection Algorithm Based On Kernel Sparse And Principal Component Analysis
9	Research Of Solutions For The Customer Segmentation Based On The Text Clustering Algorithm
10	Research And Application Of Clustering Algorithm Based On Bigdata