Font Size: a A A

The Research On Identification Of P2P Traffic

Posted on:2012-02-11Degree:MasterType:Thesis
Country:ChinaCandidate:C ZhuFull Text:PDF
GTID:2218330368488231Subject:Circuits and Systems
Abstract/Summary:PDF Full Text Request
With continuous development of the Internet, P2P (Peer-to-Peer) technology has brought great convenience for people's living, by virtue of the superior model of the network structure and the efficient processing power to traditional C/S mode. With peo-ple's rising demand for Internet applications, the file-sharing technology, voice services and streaming media applications based on P2P has been developed rapidly, but the structure characteristics of P2P makes many difficulties to its network management and maintenance. This is because the P2P applications occupied the huge bandwidth, causing network congestion, and then affecting the normal use of other services. And P2P tech-nology changes constantly trying to avoid regulation, by using random ports, tunnel mechanism or application layer encryption and other means to make the regular means of traffic identification can not be effectively carried out. So, for P2P traffic, accurate and effective identification has become the primary task of P2P traffic control problems.Firstly, in this dissertation the existing methods of P2P traffic identification have been analyzed. Including traffic identification based on port numbers, deep packet in-spection, traffic characteristics and machine learning. Because the method of traffic iden-tification based on machine learning is the research focus in current traffic identification field, this dissertation focused on several popular machine learning algorithms in details.Secondly, for the feature selection of P2P traffic identification, this dissertation studied the relevant feature selection methods, and focused on analysis of the applicabil-ity of two typical feature selection algorithms for P2P traffic identification. One is Cor-relation-based Feature Selection (CFS) algorithm and the other one is Consistency-based Feature Selection (CON) algorithm. The experimental results show that using CFS algo-rithm for feature selection can guarantee high accuracy of identification algorithm and shorten the training time and identifying time.Finally, for the deterioration of accuracy rate when the proportion of training sam-ples is low, this dissertation proposed a semi-supervised Affinity Propagation (AP) clus-tering algorithm, which core idea is using a small amount of labeled samples as the su-pervised strategy for clustering. The specific implementation steps are:(1) a certain per-cent of the samples are labeled first and to compete as the exemplars of clusters; (2) samples are clustered through messages passing between the labeled samples; (3) use the "marks-category" mapping rules to complete P2P traffic identification. For the two key parameters in the algorithm, damping factorλand preference parameter p, this disserta-tion also studied their effect on performance of this algorithm, and gives their recom-mended value in practical application. The experimental results show that comparing with Naive Bayesian of kernel estimation (NBK) algorithm and semi-supervised K-means algorithm, the algorithm proposed in this dissertation could get higher accuracy rate and lower error rate for P2P traffic identification when the proportion of labeled training samples less than 20%. This means that when this algorithm is applied to P2P traffic identification, the identification performance can be guaranteed under the premise of reducing intensity to the training samples labeling work, which makes the algorithm in the traffic identification field with a higher application value.
Keywords/Search Tags:P2P Traffic Identification, Affinity Propagation, Semi-Supervised Clustering, Machine Learning
PDF Full Text Request
Related items