Font Size: a A A

Identification Of Encrypted Traffic As Small Sample Of Class-imbalance

Posted on:2014-02-11Degree:MasterType:Thesis
Country:ChinaCandidate:M ZhangFull Text:PDF
GTID:2268330422950616Subject:Computer Science and Technology
Abstract/Summary:PDF Full Text Request
With the diversified development of application type, Internet has graduallybecome an indispensable communication platform in daily life. People enjoy theconvenience of the Internet to bring vast amount of information, and also realize theimportance of security and privacy. The implementation of encryption technologymakes the network management difficulty increase, so it is very important to identifyencrypted traffic from the massive data. The proportion of encrypted traffic is verysmall in real network environment, and traditional identification method seems likelyto cause misclassification, thus recognition to the encrypted traffic is low. In view ofthe imbalance of network traffic environment, we study the identification ofencrypted traffic in this paper.In this paper, firstly, we carry out related research on class-imbalance issues,analyze the influence of data set characteristics to classification, and also discuss thetraditional criteria to evaluate the performance of classifier. The methods of machinelearning in traffic identification are summarized. We choose two kinds of methods todeal with class-imbalance sets. In addition, we study over-sampling technique,discuss whether the implement of mutual information metric criteria is feasible, andoptimize classifier performance on the basis of Neyman-Pearson criteria.Secondly, through the research on recognition of encrypted traffic and processon class-imbalance, we propose and implement a static detection classificationsystem, which improves the identification of encrypted traffic as small samples, andcontrols the false alarm rate to a certain degree in the meanwhile. We use anover-sampling method for imbalance data preprocessing, and design a clusteringmethod based on maximum mutual information, so as to realize the optimization ofclusters number of K-Means algorithm. Use risk function and cost-sensitive methodsto optimize classifier accuracy on small samples. We construct a multi-class binaryclassifiers sequence, to minimize the overall misclassification rate, thus classifierperformance on small samples is also improved. In addition, the classifier sequenceis able to identify unknown application type.Finally, we test the system model using publicly available data sets, researchclustering model and classification model in cluster respectively, and analyze thefactors that affect performance. Experimental results show that system for theaccuracy of Skype has improved significantly. So the model has good practicability.
Keywords/Search Tags:Class-imbalance, Encrypted Traffic Classification, Mutual Information, Neyman-Pearson Criteria, SMOTE
PDF Full Text Request
Related items