Font Size: a A A

Research On Multidimensional Mining And Identification Of P2P Flow

Posted on:2009-01-17Degree:MasterType:Thesis
Country:ChinaCandidate:T PangFull Text:PDF
GTID:2178360278964117Subject:Computer system architecture
Abstract/Summary:PDF Full Text Request
The emergence of Application Network consisting mainly of streaming media distribution is leading to exponential growth in network traffic, such as P2P flow, and accompanied by DDOS (Distributed Denial of Service) attacks, worms and other flows, which poses a serious threat to the stability and normal operation of network. Therefore, it is the top priority of current network management to analyze the composition of network traffic in depth, grasp the nature, proportion and change of network traffic timely, and take corresponding measures.The method of clustering network flow hierarchically according to srcIP, dstIP, Protocol, srcPort and dstPort is known as multidimensional clustering of network flow. On the basis of analyzing the original multidimensional clustering algorithm and the structure of multidimensional hierarchical clustering tree, the original multidimensional clustering algorithm is improved. First, Protocol, srcPort and dstPort are used to construct the three-tuple clustering rules. After mining the significant three-tuple clustering rules, these rules are combined with the one-dimensional clustering results of srcIP and dstIP to obtain the significant five-tuple rules and multidimensional clustering is finished. In the algorithm, two new approaches are used to deal with the unique diamond-shaped structure of the multidimensional clustering tree to avoid repetition derivative and duplication matching operations. These two new approaches are using bottom-up approach after first using top-down approach to construct the multidimensional clustering tree, and directly limiting the duplicated nodes to be derived only on a branch of the tree. The improved algorithm not only reduces the per-match of NetFlow table length for multidimensional rules, but also reduces the number of multidimensional rules needed to be matched by NetFlow table. Therefore, the efficiency of the original multidimensional clustering algorithm is enhanced.Entropy of IP is defined according to the distribution of srcIP and dstIP of each multidimensional rule in the result of network flow multidimensional clustering, and it is used to describe the discrete level of the distribution of srcIP and dstIP. Sp2p is defined to identify P2P flow, according to entropy of IP, IP prefix and the two-way property of P2P flow. Sp2p of srcIP and dstIP of each multidimensional rule is calculated, the value of which is used to judge whether the rules are P2P flow. Finally, NetFlow data of the wide area network and local area network is used to test the performance and functionality of system. The results show that: the improved multidimensional clustering algorithm reduces the time complexity of the original one; meanwhile, by multidimensional flow mining, the composition of current network traffic can be understood clearly; moreover, the system is able to identify a variety of P2P flow which take up a large proportion of the total traffic, such as BitTorrent, PPLive, and so on.
Keywords/Search Tags:Flow Ming, Identification of P2P Flow, Multidimensional Clustering, Entropy of IP
PDF Full Text Request
Related items