| With the continuous development of the Internet,the network topology is becoming more and more complex,and the types and numbers of network applications are increasing.How network managers can efficiently manage complex networks has become an urgent problem to be solved.As a key technology of network management,network traffic classification and identification can improve network performance,communication quality and resource utilization,and is of great significance for timely detection of network anomalies,prevention of malicious attacks,and improvement of business security and stability.Existing researches on network traffic classification and identification are mainly oriented to supervised scenarios,relying on a large amount of network traffic data with class labels.However,the process of labeling network traffic is difficult and time-consuming.In real applications,only a small amount of labeled traffic data can be manually labeled or collected,and even no labeled traffic can be obtained.Moreover,due to data distribution differences between traffic domains,the classification model trained in the source traffic domain exhibits a decrease in classification performance after migrating to target traffic domains with few or no labels.In order to solve the above problems,scholars use domain adaptive transfer learning methods to reduce data distribution differences between domains,migrate the knowledge learned in the labeled source traffic domain to the target traffic domain,and complete semi-supervised and unsupervised traffic classification tasks.However,existing researches still face certain challenges:(1)In semi-supervised scenarios,existing methods are difficult to measure the distribution differences under category granularity,and different categories may be misaligned.The classification boundaries trained on the source traffic domain are no longer applicable;Traditional methods only use a small amount of labeled traffic data to support model training,making it difficult to obtain a large amount of unlabeled traffic feature information in the target domain.The model lacks comprehensive representation ability for the traffic domain,and it is difficult to identify network traffic that has lost some information in the event of network packet loss.(2)In unsupervised scenarios,existing methods lack multi-scale perception ability when dealing with multi-source traffic domains,focusing on obtaining abstract global higher-order features while neglecting the problem of local feature information loss,especially the relatively difficult identification of small traffic;When aligning the distribution between multiple source traffic domains,the means are single and do not have the ability to dynamically adjust according to the actual data distribution;In addition,it is difficult to distinguish the contribution of each source traffic domain to the target domain traffic classification task,resulting in irrelevant source domains misleading model training.At the same time,the problem of data heterogeneity between multi-source traffic domains and the inconsistency of classification boundaries between multisource classifiers result in significant traffic classification errors near the class boundaries.To solve the above problems,this paper proposes a network traffic classification and identification method based on domain adaptive transfer learning.The main research contents and innovations are as follows:(1)For semi-supervised(less labels)scenario,a network traffic classification and identification method based on single source domain adaptive is proposed.Firstly,a multi-grain discriminable single-source domain adaptive method is proposed to reduce the data distribution differences between source traffic domains and target traffic domains at domain level and class-level granularity,while encouraging intra-class convergence and inter-class separation of traffic samples.Secondly,a Siamese sparse denoising stack automatic encoder is designed,which combines unsupervised reconstruction loss and supervised classification loss to guide training,fully extracting the traffic characteristics of the target domain,and adding gaussian random noise and sparse constraints to force the model to reconstruct the original traffic,effectively identifying network traffic that has lost some information.The simulation experiment shows that the network traffic can be accurately classified and the network application can be identified effectively when there is only a small amount of labeled traffic data in the target domain.(2)For the unsupervised(unlabeled)scenario,a network traffic classification and identification method based on multi-source domain adaptive is proposed.Firstly,a pyramid network based on dynamic multiscale fusion convolution is designed to extract traffic features at different scales,fuse high-order and low-order features,and generate more discriminative feature representations.Secondly,a multi-source domain adaptive method based on collaborative distribution alignment is proposed,which aligns the data distribution between the source traffic domains and the target domain from three aspects:adversarial,marginal,and conditional distribution.The direction of distribution alignment is dynamically adjusted based on the distribution differences between domains.And a consistency calibration weighted network traffic classifier based on decision related certainty is proposed to measure the contribution of multiple source traffic domains to the target domain traffic classification task,encourage the transfer of similar source domain traffic characteristics,and suppress irrelevant source domain interference.Through sample optimization and prediction consistency calibration,the data heterogeneity is reduced and the decision boundary of multi-source classifiers is unified.The simulation experiment shows that the network traffic can be accurately classified and the network application can be identified effectively under the condition that the target domain traffic data is completely unlabeled. |