Recent years, P2P (Peer-to-Peer) is rapidly becoming a hot topic of the concerning of the computer industry. As a new technology that can change the application mode of internet, P2P traffic has been occupied 60% to 80% of the total Internet traffic, and it becomes the killer application of the Broad Band Internet. The traffic generated by the P2P application has the properties of distribution of non-balanced traffic, symmetrical of the downstream flow, the intensive data. The growing of P2P applications cause enormous network bandwidth consumption and even cause network congestion and reduce the performance of other applications. On the other hand, this huge flow has brought tremendous pressure to the network operators, How to achieve more effective management of these flows is the main challenge and problem they are facing. Therefore, the effective realization of P2P traffic identification is an urgent need to solve the problem.However, with the rapid development of the P2P technology, in order to avoid being easily detected by their own, P2P application has adopted a variety of technologies such as dynamic port, the encryption of the protocol fields. P2P traffic identification technology also will face some severe challenges. Due to the port jumping, and the development of the technology for the encryption of the traffic load, the P2P traffic identification method that based on the explicit characteristics of the P2P application such as the using of the fixed port and the content have gradually been eliminated. Therefore, the new P2P traffic identification approach has begun to shift to base on the characteristics of the transport layer and the data mining method. The P2P traffic identification approach that based on the behavior characteristics of the transport layer is a more accurate method. But this method also has a significant drawback. It is undeveloped, only applies to the post record traffic analysis, and can not be used for real-time traffic identification. But in real life, people need to identify P2P traffic in real time so that they can achieve traffic control and improve network performance. Consequently, we need to find a more effective and real-time P2P traffic identification approach.This thesis firstly looks at the principle of the P2P traffic identification and several typical P2P traffic identification methods. Then we investigate the data mining method in the identification of the P2P traffic, in which we mainly focus on the research of the application flow feature selection on new P2P identification approach.This thesis is divided into six chapters. Chapter 1 provides a brief introduction of the background and the major work of this thesis. Chapter 2 provides an introduction of the key P2P technology, and analyses the principle of the P2P traffic identification methods and their features and problems in the identification process. Chapter 3 gives a brief introduction about the application of the data mining technique in the traffic identification and demonstrates the feasibility and necessity of adopting the data mining technique in P2P traffic identification. In this chapter we highlight the importance of the selection of the attributes of the traffic and how to select the attributes. Finally, we introduce some relevant data mining algorithms that will be used in followed chapters. In chapter 4, we design and implement an application program to compute the attributes of the traffic, and compute several attributes of some typical network application traffic. We draw the graphics of the computed attributes and analyze these results. In chapter 5, we use the introduced data mining algorithm to process these attributes and test the effectiveness of these attributes in the P2P traffic identification. The last chapter presents the future work. |