Font Size: a A A

Research On Class Imbalanced Network Encrypted Traffic Identification

Posted on:2021-08-11Degree:MasterType:Thesis
Country:ChinaCandidate:D WangFull Text:PDF
GTID:2518306107482024Subject:Information and Communication Engineering
Abstract/Summary:PDF Full Text Request
With the rapid development of the Industrial Internet,most of the traffic in the Industrial Internet is encrypted,and the distribution of these encrypted traffic is unbalanced.Identifying the types of imbalanced encrypted traffic in the Industrial Internet to fully understand the various industrial services included in encrypted traffic is of great significance for industrial network service quality assurance,industrial network planning and management,and industrial network security.This article studies the problem of class imbalanced encrypted traffic identification.The main work is summarized as follows:(1)This paper studies the method of encrypted traffic identification based on machine learning,mainly including classification and regression tree algorithms(Classification And Regression Tree,CART),C4.5 algorithm,random forest algorithm(Random Forest,RF).The three algorithms are tested on two kinds of unbalanced encrypted traffic data sets.The experimental results show that when identifying imbalanced encrypted traffic,there is a problem that the recognition rate of a few types of encrypted traffic is low.How to improve the recognition rate of a few types of encrypted traffic in the class unbalanced encrypted traffic identification,thereby improving the overall recognition rate of the class unbalanced encrypted traffic,this is the research content of this article.(2)From the aspect of feature selection,this paper mainly studies the shortcomings of information gain algorithm and Relief F algorithm in the recognition of class imbalanced encrypted traffic.Aiming at the problem of class imbalanced encrypted traffic identification,a multi-objective optimized feature selection algorithm based on RF is adopted.The algorithm uses the AUC evaluation index of the RF classification model to evaluate the feature subset.At the same time,the NSGA-? algorithm is used to optimize the two objective functions of maximizing the AUC evaluation index and minimizing the feature subset dimension.Thus,we can get the optimal feature subset that is most favorable for class imbalanced encrypted traffic identification.The experimental results show that the RF-based multi-objective optimization feature selection algorithm can improve the recognition rate of a few types of encrypted traffic in class imbalanced encrypted traffic recognition,thereby improving the overall recognition rate of class imbalanced encrypted traffic.(3)From the perspective of algorithm,this paper mainly studies the shortcomings of Bagging ensemble learning model and Boosting ensemble learning model in the recognition of class imbalanced encrypted traffic.Aiming at the problem of class imbalanced encrypted traffic recognition,an Ada Boost.M1-RF ensemble learning model fusion algorithm is used.This algorithm uses the RF learner as the individual learner of the Ada Boost.M1 algorithm to build the Ada Boost.M1-RF ensemble learning model fusion algorithm.This algorithm can fully take advantage of Bagging ensemble learning model and Boosting ensemble learning model.The algorithm thus solves the problem of class imbalanced encrypted traffic identification well.The experimental results show that the Ada Boost.M1-RF ensemble learning model fusion algorithm can improve the recognition rate of a few types of encrypted traffic in class imbalanced encrypted traffic recognition,thereby improving the overall recognition rate of class imbalanced encrypted traffic.
Keywords/Search Tags:Class imbalance, Encrypted traffic identification, Machine learning, Feature selection, Ensemble learning
PDF Full Text Request
Related items