Font Size: a A A

Research On Classification Of P2P Traffic Based On Machine Learning

Posted on:2016-06-12Degree:MasterType:Thesis
Country:ChinaCandidate:L DingFull Text:PDF
GTID:2298330467961904Subject:Computer application technology
Abstract/Summary:PDF Full Text Request
In recent years, P2P network with its advantages of equality, freeness, openness presentsa booming trend in many fields in the Internet.It has become the major part of the Internettraffic. The development of P2P network has become a threat to the networksecurity.Moreover, a huge number of P2P traffic is swallowing network bandwidth, whichmakes non P2P network bandwidth unavaliable. The negative effects of P2P network are newchallenges to the network operators and network managers. To strengthen the management ofP2P traffic through P2P traffic identification technology is one of the research directions.However, with the development of Internet technology, more and more P2P applicationsuse dynamic ports, tunneling technology and protocol encryption technology in order to adaptto the network environment,which makes the traditional identification technology likeport-based technology and applicational layer feature technology can not meet therequirements of P2P identification.With the matureness of machine learning theory, machine learning has been widelyapplied into medical diagnosis, image recognition, audio recognition and network security.Machine learning uses mathematical statistics knowledge and algorithm theory to establisheffective learning model, and concludes the inner rules of data.It doesn’t require muchinformation, and is not related to user privacy.Machine learning technology can cope with thedynamic data environment, and it adapts well to the dynamic P2P network.This paper researches P2P traffic classification based on machine learning. P2P trafficclassification uses P2P statistical characteristics of network flow to build classification model.On one hand, how to select efficient features from a large number of P2P flow statisticalcharacteristics has a profound effect on classification results. On the other hand, how toestablish classification model is very important for final classification result.The maincontents of this paper are as follows:1. P2P traffic data established by statistical method own hundreds of features. Faced with somany features, an improved algorithm ReliefF&based on ReliefF has been putforward.ReliefF&is able to remove the redundant features from the feature subset resultedfrom ReliefF, and it not only reduces the dimension of feature space,but also improves theclassification ability of the feature subset.2, In the study of ensemble learning algorithm,this paperwork comparatively analysis thefunction rules of AdaBoost and Bagging.The concept of selective ensembling based onBagging is introduced. Using Q statistic to measure differences between each two baseclassifiers,the method of selective ensembling is designed.It works by deleting the baseclassifier which has the minimum difference with the ensembling system.Conclusions aredrawn that PBagging algorithm based on the classifier of decison tree can enhance theperformances of Bagging by experiments.3. In the study of ensembling different types of classifiers, this paperwork puts forward anensemble model made up of Bayesian, SVM and decision tree.Experiments prove that theensemble model shows better results than every single classifier through network flow data.Achievements of this paper can effectively improve the feature selection process, which will furtherly improve the recognition rate of P2P traffic.The research provides a new solutionto the classification of P2P traffic, which will not only promotes the management of P2Ptraffic, but also helps to build more and reliable net environment, thus to create a harmoniousnetwork atmosphere.
Keywords/Search Tags:P2P traffic, machine learning, feature selection, ensemble learning, Bagging, decision tree
PDF Full Text Request
Related items