Font Size: a A A

Research On P2P Traffic Classification Method Based On Pearson Coefficient Distance Weight KNN Algorithm

Posted on:2020-04-02Degree:MasterType:Thesis
Country:ChinaCandidate:H Y YinFull Text:PDF
GTID:2428330602963175Subject:Computer technology
Abstract/Summary:PDF Full Text Request
In the process of the continuous development of P2 P technology,the advantages of P2 P technology in file sharing,data storage,multimedia and other services make P2 P traffic become the main body of the whole Internet traffic.However,with the characteristics of peer-to-peer coverage network,P2 P network brings great challenges to the management and maintenance of Internet providers.The great challenges brought by P2 P technology include:the main traffic of the Internet is occupied by P2 P traffic,and the non P2 P traffic is obviously insufficient in the utilization rate of bandwidth resources,which leads to the non P2 P application cannot be guaranteed in the availability;with the continuous updating and iteration of technology,more and more P2 P applications begin to use random port and protocol encryption technology,which makes the information security problems brought by P2 P applications;The traditional P2 P traffic identification technology is mainly based on application port,application layer signature and behavior characteristics,but with the increasing complexity of P2 P technology,the traditional identification technology has been unable to meet the existing needs.Machine learning is widely used as a science which uses computer to simulate human behavior.Machine learning method mainly transfers the changing external information to machine learning algorithm,so as to establish an algorithm model,which can improve its learning ability according to the changes of the external environment.This paper is aimed at the research of P2 P traffic classification methods.By comparing the traditional P2 P traffic classification methods,it studies the advantages of machine learning method in dealing with P2 P traffic classification.The main work of this paper is as follows:1.Based on the research of the ReliefF feature selection algorithm and the analysis of its advantages and disadvantages,an improved MSReliefF algorithm is proposed.MS-ReliefF algorithm reduces the original feature set from two dimensions:vertical and horizontal,selects the optimal feature pair through joint feature weights,calculates the correlation between features to remove redundant features,and finally obtains the optimal feature subset.2.By analyzing the traditional KNN algorithm,an improved PSDWKNN algorithm is proposed based on the traditional KNN algorithm.PSDW-KNN algorithm introduces feature distance weight and Pearson correlation coefficient.Based on the traditional KNN algorithm,different features are given corresponding weight,and the correlation degree between samples is calculated by Pearson correlation coefficient.Through experiments,the traditional KNN algorithm,DW-KNN algorithm and the improved PSDW-KNN algorithm are compared.The experimental results show that the PSDW-KNN algorithm has higher classification accuracy when the value of nearest K is small and the number of training samples is large.3.In order to improve the computing power of machine learning algorithm model,this paper combines the PSDW-KNN algorithm and Spark MLlib machine learning library with the computing power of Spark distributed computing framework to build a prototype system to further improve the classification and recognition efficiency of algorithm model.
Keywords/Search Tags:P2P traffic, Machine learning, Feature selection, KNN, Spark MLlib
PDF Full Text Request
Related items