Font Size: a A A

Research On The P2P Traffic Identification Method Based On Decision Tree

Posted on:2016-04-26Degree:MasterType:Thesis
Country:ChinaCandidate:J H WangFull Text:PDF
GTID:2348330488472898Subject:Engineering
Abstract/Summary:PDF Full Text Request
In the early days of the Internet, Client-Server model are generally used for network connection. As users only send a few of request while receiving large amount of content, the data traffic of upstream and downstream in network is asymmetric. In the late 20 th century, P2 P protocol was proposed, it provides users an opportunity to share their multimedia catalogs or to interact directly with each other via the Internet. The P2 P protocol enables a direct distribution of household and content sharing among all the users. Since then, the user is able to play a new and dynamic role, not only acting as a client, but also as a server. This important change of user in service modifies the traditional network traffic, which is evolving towards a more balanced bandwidth usage in both directions. Additionally, as most of these applications present a greedy profile, consuming as much bandwidth as they can, a proper policy is needed to regulate these behaviors. So P2 P system draws special attention of network managers since its dual-role.There are also many problems about the safety and internet service limitation. Due to the limitation of network bandwidth, application which consumes excessive network bandwidth becomes the potential damage to all kinds of operators and units. So how to effectively and fast identify P2 P traffic and manage it has become the hotspot and focus of current research. Based on this issue, the thesis also studies the extraction of traffic characteristics and fast traffic classification.First, the paper describes the difference between the classic network transmission mode and P2 P application in network topology and mechanism, and presents the current situation of P2 P application. Then we introduce the topology, application and characteristics of P2 P network, as well as the four commonly used P2 P traffic identification techniques. After that, we study the newly emerging machine learning algorithm and analyze its main methods, such as KNN ? SVM ? Bayes and DT algorithm, along with their advantages and disadvantages in processing data, and algorithm complexity, etc. Then using the Weka data mining tools on effective public Moore data sets, we compare these four algorithms and verify the advantages of decision tree algorithm under the comprehensive condition. On this basis, we study the decision tree based VFDT algorithm and its improved algorithm, and implement them on Weka. We run the simulation on different data set and make a contrast on these two algorithms. The results show that the improved algorithm effectively overcomes traffic drift caused by dynamic data changes in different time and regions. At last, a P2 P traffic identification system is designed. It uses Winpcap to capture data packet and constructs the initial data set. It selects the average length of data packet, the standard variation of average data packet length, UDP ports and its usage rate and the IP address as the features set of P2 P traffic. Then we combine these features vectors and CVFDT algorithm to extract the features of traffic and identify the traffic. The test result shows that this algorithm can effectively identify the four kinds of P2 P traffic and further verifies the advantage of accuracy and stability of this algorithm.
Keywords/Search Tags:P2P, Weka, CVFDT, Real-time traffic Identification, Feature Extraction
PDF Full Text Request
Related items