Font Size: a A A

Research Of P2P Traffic Identification Based On Machine Learning Algorithm

Posted on:2015-01-31Degree:MasterType:Thesis
Country:ChinaCandidate:J TanFull Text:PDF
GTID:2308330479451616Subject:Communication and Information System
Abstract/Summary:PDF Full Text Request
The emergence and boom of P2 P applications have remarkably changed the composition of Internet traffic. P2 P traffic has become the primary Internet traffic, which has brought many problems to network management and put forward higher requirement for traffic monitoring. Meanwhile, in order to escape from identification, P2 P applications are developing quickly in the direction of using dynamic ports and encrypting payload. So, traditional traffic identification techniques have difficulty in identifying P2 P traffic effectively. Due to its independence of ports and payload, P2 P traffic identification based on machine learning has become a hotspot in recent years.This paper first analyzes P2 P technology, including the definition of P2 P technology, network structure, characteristics and type of application, summarizes the P2 P traffic identification technology at this stage, the analysis focuses on the machine learning algorithms in P2 P traffic identification. Then make a study on the K-means and decision tree algorithm, A new P2 P traffic identification algorithm based k-means and decision tree is proposed to improve the accuracy of supervised learning P2 P traffic identification cased by the scarcity of labeled samples. In order to improve the accuracy of the K-means clustering, to provide accurate labeling samples for the training of decision trees, first, an improved semi-supervised K-means clustering is proposed, greedy algorithm and labeled flows are used to initialize cluster centers instead of the random selection of the cluster centers, maximum likelihood estimation is selected to construct a mapping from the clusters to the predefined traffic classes set, the performance of K-means clustering algorithm is improved. Then using the improved semi-supervised K-means clustering to preprocess labeled samples and unlabeled samples, and the decision tree model is trained based on the processed samples. The experiments show that the method is able to maintain a higher recognition accuracy.
Keywords/Search Tags:P2P, traffic identification, machine learning, k-means, decision tree
PDF Full Text Request
Related items