Font Size: a A A

Research Of The Application Type Of Encrypted Traffic Identification And Classification Technology Based On Deep Learning

Posted on:2022-03-18Degree:MasterType:Thesis
Country:ChinaCandidate:M H ChenFull Text:PDF
GTID:2518306521457954Subject:Cyberspace security
Abstract/Summary:PDF Full Text Request
With the continuous development of traffic encryption technology,it plays an important role in protecting users' privacy and transmitting data safely,but it also provides some convenience for the illegal in the network to hide their attack actions and bypass the monitoring of the network security system.The technical characteristics of encrypted traffic also bring great challenges to traditional traffic classification methods,such as traffic classification based on traffic port,traffic classification based on payload keyword matching,which poses a serious threat to network security.How to classify the application type of encrypted traffic quickly and efficiently without decrypting traffic payload has become a hot research topic in the field of traffic analysis.In recent years,the deep learning has made great achievement in such fields as image recognition,automatic driving and text classification.Researchers try to transfer the research of related work to the field of traffic classification.Because the traffic recognition and classification based on deep learning is less affected by traffic encryption,the related work has better inheritance from unencrypted traffic classification to encrypted traffic classification and has become the key research direction of encrypted traffic classification.What this paper studies includes not only the core model design for encrypted traffic classification,but also the solution of problems of imbalanced data distribution and inability to rapidly expand the upper limit of classification in practical application scenarios of encrypted traffic classification.The main researches of this paper are as follow:1.Aiming at the problem of imbalanced data distribution of encrypted traffic classification in encrypted traffic classification based on deep learning,one solution of generating sampler artificially based on improved G-SMOTE algorithm is proposed.This algorithm fully considers the possible imbalance between and within categories of encrypted traffic.Firstly,it uses Canopy algorithm to preliminarily calculate the number of clusters in each type of encrypted traffic,and then uses K-Means algorithm to generate the encrypted traffic clusters.After that,the index of sample generation is determined by cluster.Finally,the improved G-SMOTE algorithm is used to generation samples.The improved G-SMOTE algorithm makes full use of the feature space of encrypted traffic samples,which efficiently solves the problems of insufficient diversity of minority classes in unbalanced data.The implementation of the algorithm is optimized by approximate substitution method that can greatly speed up the speed of sample generation.The experimental results show that the improve G-SMOTE algorithm can improve the overall accuracy by an average of 16% in both encrypted traffic temporal features and spatial features,which is far better than some related work in this field.What's more,it indeed militates to achieve higher recognition and classification accuracy for encrypted traffic categories with fewer samples.2.Aiming at the core model design of encrypted traffic classification based on deep learning,this paper proposes an encrypted traffic classification framework based on Attention-CNN model.The framework is divided into feature extraction module and final recognition module.In the feature extraction module,the shortcomings of single feature dimension used in existing research on encrypted traffic classification are solved.Bi LSTM+Attention and 1D-CNN model are both used to further compress and extract the temporal and spatial features of encrypted traffic respectively.In the final recognition and classification module,an imbalanced processing system based on improved G-SMOTE algorithm is introduced to ensure the stability of the framework in face of imbalanced encrypted traffic data.The new temporal and spatial features of encrypted traffic obtained by feature extraction module are spliced together as the final recognition basis.Two encrypted traffic datasets collected in different network environments are used to test and the related work of encrypted traffic classification based on LSTM,1D-CNN and 2D-CNN are compared.The Attention-CNN obtains the highest accuracy of 99.1% and 98.6% in these two datasets.The experimental results indicate that the framework proposed in this paper can flexibly adjust the classification centers according to the specific situation of encrypted traffic in different network environment,and always maintain high classification accuracy and stability.3.In view of the problem that the encrypted traffic type classification system based on deep learning can't expand the classification task quickly,this paper proposes an extensible scheme based on Bic layer structure.Due to the fixed structure of the existing encrypted traffic classification model,although it's able to classify different types of encrypted traffic,it could not expand the upper limit of the number of classification types.If there are new types of encrypted traffic needing to be classified,the model must be retrained.The retraining process wastes a lot of time and the accumulated knowledge of encrypted traffic classification is lost.This paper introduces the BIC structure which is the SOTA work in large-scale data incremental learning to improve the Attention-CNN encrypted traffic classification framework so that it can quickly expand the upper limit of classification types without ‘forgetting' the existing classification ability.The comparative experiments based on three different traffic environments are set up.The classification accuracy results of improved Attention-CNN+Bic framework are 98.9%,98.6% and97.7% respectively,which are 0.7%,0.5% and 1.4% higher than the basic version of the incremental classification.The experimental results show that the scheme can alleviate the‘forgetting' problem in the incremental recognition process of encrypted traffic application type.
Keywords/Search Tags:Deep Learning, Imbalanced Data Solution, Encrypted Traffic Classification, Incremental Learning, Feature Extraction of Encrypted Traffic
PDF Full Text Request
Related items