| Nowadays the application of Peer-to-Peer(P2P) is one of the most popular applications on the Internet. However, network congestion brought by P2P traffic has become a serious problem which has effect on network capability. So how to control and identify the P2P traffic on the Internet effectively has become an issue requiring urgent solution.For the open-protocol P2P software, this paper studies their characters of the port-numbers and characteristic string, and the used algorithm of traffic identification for the traffic; for the unknown-protocol software, this paper studies their characters of the connection-amount, package-amount and connection-time, and the algorithm of this kind of traffic identification using the way of machine learning.Firstly, for the open protocol P2P software, this paper raises a kind of string matching algorithm combining with specific P2P software protocol, which avoids time-spending caused by global matching. Through the theoretical analysis and experimental verification, not only this algorithm is correct and effective, but also it controls the time complexity in a constant level. Secondly, for the unknown protocol P2P software, this paper raises two improved algorithms based on the Fuzzy Support Vector Machine and the Ant Clustering respectively, and applies the two algorithms to the traffic identification for the first time. Through the theoretical analysis and lots of experimental data validation, the two improved algorithms can not only identify P2P traffic effectively, but also their effect is better than the traditional the Fuzzy Support Vector Machine and the Ant Clustering.Lastly, this paper designs an archetypal system based on traffic identification. This system adopts port-based and improved string matching algorithm for the open-protocol P2P traffic identification, and adopts the improved algorithms based on the Fuzzy Support Vector Machine and the Ant Clustering respectively for the unknown traffic identification. Also this system combines the two algorithms effectively. It cannot only identify the traffic on line correctly, but also avoid the problem of affecting user's QoS caused by long time using for recognition. |