Font Size: a A A

Study Of P2P Flow Measurement And Identification Method

Posted on:2009-02-12Degree:DoctorType:Dissertation
Country:ChinaCandidate:B LiuFull Text:PDF
GTID:1118360272472246Subject:Computer system architecture
Abstract/Summary:PDF Full Text Request
P2P (peer-to-peer) is a new model of network application, which is characterized by relying on the edge node of the network, rather than center node to achieve self-organizing and sharing resources. P2P networks are typically used for file sharing, media streaming, instant communication etc. While P2P is in the rapid development in recent years, it also has brought many new problems for network management, such as much bandwidth occupying and network security. Since most of P2P applications are using dynamic random port numbers, data encryption, the traditional port matching technology has become useless for P2P flow identification. Research on P2P flow identification has become the most important problem of P2P flow management.In this paper, four research areas of P2P flow measurement and identification methods have been deeply studied, including the typical P2P system measurement, heuristic identification method, finding unknown P2P application, as well as machine learning method.BitTorrent is a recent, yet successful P2P protocol focused on efficient content delivery. To gain a better understanding of BitTorrent protocol, an active measurement system which modified BitTorrent client is designed. This method allows us to get detailed information on all exchanged messages and protocol events. Experimental evaluation showed that the peers from which the local peer download the most are also the peers that receive the most uploaded bytes. In the passive measurement study, a BitTorrent measurement method using application signature is present. The measurement framework included two parts, connection tracking and application-layer signature match. A hash algorithm for connection tracking based on XOR operation is provided. Matching BitTorrent application-layer signature, the method can identify BitTorrent flow accurately. BitTorrent flow length characteristics, and flows inter-arrival characteristics are analyzed. It is found that the BitTorrent flow's inter-arrival distribution follows Weibull distribution, BitTorrent flow's length distribution follows Lognormal distribution.The heuristic method for identifying P2P application has been studied. BEH algorithm which is a P2P host identification method based on the multiple characteristics is proposed. Firstly, several behaviors that are inherent to P2P flow are explored. These behaviors have been translated to metrics: the ratio of incoming and outgoing connections, remote hosts' IP address entropy and the use of high ports. BEH which combined three individual metric together showed low false positive in experiment. A method to realize the P2P flow classification based on the support vector machine is proposed. Researches had been focused on four kinds of P2P application BitTorrent, Emule, PPLive and PPstream. The experimental results confirm the validity of proposed method, the average precise rate is 92. 2%.A new flow analysis method called MCT based on multi-dimensional clustering tree is proposed. Firstly, each dimensional of flow data is hierarchical clustered to identify the dominant flows. After mining the significant one-dimensional rules, using multi-dimensional clustering tree, these rules are combined to find significant multi-dimensional rules. An unknown P2P identification method based on MCT is present. According to entropy of IP, IP prefix and the two-way property of P2P flow, metric Sp2p is defined to identify P2P flow. The results show that: by multidimensional flow mining, the composition of current network traffic can be understood clearly. Moreover, the system is able to identify a variety of P2P flow which take up a large proportion of the total traffic.Machine learning techniques provide a promising method in classifying flows based on application protocol. A two-phase combined feature selection algorithm called ESBS is designed. In the first phase, a entropy method is used to filter the irrelevant features. In the second phase, backward sequential search algorithm is used to remove the redundant features with the performance of the induction algorithm. Using ESBS, 11 features have been selected from 249 features of Andew Moore datasets. A semi-supervised clustering method called PSOSC for the flow classification of application is proposed. Firstly, a novel Kmeans clustering algorithm based on Particle Swarm Optimization for a few labeled and many unlabeled flows had been present. Then, using a few labeled flows, clusters were mapped application. Experimental evaluation by Andew Moore datasets showed that high flow classification accuracy can be achieved.
Keywords/Search Tags:P2P, Bittorrent, Behaviour characteristics, Support vector machine, Feature selection, Semi-supervised clustering, Machine learning
PDF Full Text Request
Related items