Identification Of Encrypted Traffic As Small Sample Of Class-imbalance

Posted on:2014-02-11

Degree:Master

Type:Thesis

Country:China

Candidate:M Zhang

Full Text:PDF

GTID:2268330422950616

Subject:Computer Science and Technology

Abstract/Summary:

PDF Full Text Request

With the diversified development of application type, Internet has graduallybecome an indispensable communication platform in daily life. People enjoy theconvenience of the Internet to bring vast amount of information, and also realize theimportance of security and privacy. The implementation of encryption technologymakes the network management difficulty increase, so it is very important to identifyencrypted traffic from the massive data. The proportion of encrypted traffic is verysmall in real network environment, and traditional identification method seems likelyto cause misclassification, thus recognition to the encrypted traffic is low. In view ofthe imbalance of network traffic environment, we study the identification ofencrypted traffic in this paper.In this paper, firstly, we carry out related research on class-imbalance issues,analyze the influence of data set characteristics to classification, and also discuss thetraditional criteria to evaluate the performance of classifier. The methods of machinelearning in traffic identification are summarized. We choose two kinds of methods todeal with class-imbalance sets. In addition, we study over-sampling technique,discuss whether the implement of mutual information metric criteria is feasible, andoptimize classifier performance on the basis of Neyman-Pearson criteria.Secondly, through the research on recognition of encrypted traffic and processon class-imbalance, we propose and implement a static detection classificationsystem, which improves the identification of encrypted traffic as small samples, andcontrols the false alarm rate to a certain degree in the meanwhile. We use anover-sampling method for imbalance data preprocessing, and design a clusteringmethod based on maximum mutual information, so as to realize the optimization ofclusters number of K-Means algorithm. Use risk function and cost-sensitive methodsto optimize classifier accuracy on small samples. We construct a multi-class binaryclassifiers sequence, to minimize the overall misclassification rate, thus classifierperformance on small samples is also improved. In addition, the classifier sequenceis able to identify unknown application type.Finally, we test the system model using publicly available data sets, researchclustering model and classification model in cluster respectively, and analyze thefactors that affect performance. Experimental results show that system for theaccuracy of Skype has improved significantly. So the model has good practicability.

Keywords/Search Tags:

Class-imbalance, Encrypted Traffic Classification, Mutual Information, Neyman-Pearson Criteria, SMOTE

PDF Full Text Request

Related items

1	Research On Key Algorithms For Class-imbalanced Network Encryption Traffic Classification
2	Comparison of two human detection algorithms that apply Bayesian and Neyman-Pearson test criteria using infrared images
3	FLOWGAN:Research On Key Technology Of Encrypted Traffic Identification Based On Generative Adversarial Network
4	Research On Class Imbalanced Network Encrypted Traffic Identification
5	Research On The Application Of Generative Adversarial Networks In Class Imbalance
6	Studying Class Imbalance Characteristics And Classification Methods On Internet Traffic Flows
7	Relationships Between Evaluation Criteria Of Feature Selection And Analysis On Class Imbalance Problem Over Vhr Remote Sensing Imagery
8	Design And Implementation Of A Migration Learning-based Encrypted Traffic Classification System
9	Improved Grouped SMOTE With Noise Filtering Mechanism
10	Research On Key Issues In Internet Traffic Classification