The Research On Identification Of P2P Traffic

Posted on:2012-02-11

Degree:Master

Type:Thesis

Country:China

Candidate:C Zhu

Full Text:PDF

GTID:2218330368488231

Subject:Circuits and Systems

Abstract/Summary:

PDF Full Text Request

With continuous development of the Internet, P2P (Peer-to-Peer) technology has brought great convenience for people's living, by virtue of the superior model of the network structure and the efficient processing power to traditional C/S mode. With peo-ple's rising demand for Internet applications, the file-sharing technology, voice services and streaming media applications based on P2P has been developed rapidly, but the structure characteristics of P2P makes many difficulties to its network management and maintenance. This is because the P2P applications occupied the huge bandwidth, causing network congestion, and then affecting the normal use of other services. And P2P tech-nology changes constantly trying to avoid regulation, by using random ports, tunnel mechanism or application layer encryption and other means to make the regular means of traffic identification can not be effectively carried out. So, for P2P traffic, accurate and effective identification has become the primary task of P2P traffic control problems.Firstly, in this dissertation the existing methods of P2P traffic identification have been analyzed. Including traffic identification based on port numbers, deep packet in-spection, traffic characteristics and machine learning. Because the method of traffic iden-tification based on machine learning is the research focus in current traffic identification field, this dissertation focused on several popular machine learning algorithms in details.Secondly, for the feature selection of P2P traffic identification, this dissertation studied the relevant feature selection methods, and focused on analysis of the applicabil-ity of two typical feature selection algorithms for P2P traffic identification. One is Cor-relation-based Feature Selection (CFS) algorithm and the other one is Consistency-based Feature Selection (CON) algorithm. The experimental results show that using CFS algo-rithm for feature selection can guarantee high accuracy of identification algorithm and shorten the training time and identifying time.Finally, for the deterioration of accuracy rate when the proportion of training sam-ples is low, this dissertation proposed a semi-supervised Affinity Propagation (AP) clus-tering algorithm, which core idea is using a small amount of labeled samples as the su-pervised strategy for clustering. The specific implementation steps are:(1) a certain per-cent of the samples are labeled first and to compete as the exemplars of clusters; (2) samples are clustered through messages passing between the labeled samples; (3) use the "marks-category" mapping rules to complete P2P traffic identification. For the two key parameters in the algorithm, damping factorλand preference parameter p, this disserta-tion also studied their effect on performance of this algorithm, and gives their recom-mended value in practical application. The experimental results show that comparing with Naive Bayesian of kernel estimation (NBK) algorithm and semi-supervised K-means algorithm, the algorithm proposed in this dissertation could get higher accuracy rate and lower error rate for P2P traffic identification when the proportion of labeled training samples less than 20%. This means that when this algorithm is applied to P2P traffic identification, the identification performance can be guaranteed under the premise of reducing intensity to the training samples labeling work, which makes the algorithm in the traffic identification field with a higher application value.

Keywords/Search Tags:

P2P Traffic Identification, Affinity Propagation, Semi-Supervised Clustering, Machine Learning

PDF Full Text Request

Related items

1	Improvement And Application Of Affinity Propagation Clustering Algorithm Based On Semi-Supervised Learning
2	Research And Application Of Semi-supervised Clustering Algorithms
3	Research On High-speed IP Service Awareness Based On Traffic Measurement
4	Research On Two Clustering Algorithms Based On Semi-Supervised Learning
5	Some Affinity Propagation Clustering Algorithms Based On Swarm Intelligence Optimization And Their Applications
6	Research On Telecom Customer Segmentation Based On Semi-Supervised Affinity Propagation Clustering
7	Affinity Propagation Clustering Algorithm For The Data With Complex Structure
8	Research On Affinity Propagation Clustering Algorithm Based On Manifold Distance And Density Adjustment
9	Affinity Propagation Clustering Algorithm Theory Of Improving And Its Application
10	Improved Affinity Propagation Clustering Algorithms And Their Applications