With the rapid development of internet technology,people are increasingly concerned about protecting their privacy.In this context,anonymous networks have become an important tool for privacy protection.Through encryption and routing technology,anonymous networks can hide the identity and location of communicators,protecting user privacy and security.However,anonymous networks are not a perfect solution.Some criminals use anonymous networks for illegal activities such as drug trafficking,gambling,terrorism,and pornography.Anonymous networks are also used for hacking activities such as network attacks and phishing.These criminal activities and behaviors seriously threaten public safety and social stability,and regulating anonymous networks is a huge challenge.As internet technology continues to develop,website traffic identification technology plays an increasingly important role in internet security.Traffic identification is one of the main means of analyzing anonymous networks,but due to the constantly changing and updating content on the internet,the characteristics of website traffic are also constantly changing,resulting in the concept drift problem of traffic.Traditional models based on static data using machine learning and deep learning cannot accurately capture these new behaviors,leading to traffic identification failure.Therefore,how to achieve accurate traffic identification in the concept drift scenario of anonymous traffic is a problem worth studying.The main contents of this article are as follows:This article proposes an updating method for an anonymous traffic recognition model based on small-sample domain adaptation.Firstly,the concept drift of the model is detected by conducting maximum mean discrepancy and permutation tests on data from different time windows.Then,small-sample domain adaptation is performed by classification and contrastive semantic alignment losses to improve the model’s performance in the target domain.Compared to the transfer learning method TLFA,the updating method based on small-sample domain adaptation is more suitable for scenarios where there are significant differences between the source and target domains.It better preserves the features of the target domain and reduces interference from the source domain.This method requires only a small number of samples to achieve the transfer from the source domain to the target domain.When concept drift occurs,the accuracy of the source model DF is improved from around 87%to approximately 97%,outperforming the TF and TLFA-based methods.With 1 and 5 samples in the target domain,the overall accuracy is improved by approximately 2%compared to TLFA,demonstrating higher classification accuracy.This article proposes an updating method for an unsupervised anonymous traffic recognition model based on Maximum Classifier Difference(MCD).This method can update the concept drift model without requiring labeled samples,improving the accuracy of the concept drift model from around 87%to 90%.The article also compares different unsupervised domain adaptation methods,different feature extraction algorithms,and the impact of different update intervals of MCD on the model updating task through experiments.The experimental results show that the MCD-based method can utilize unlabeled drifting data to improve model performance and provide an effective unsupervised updating method for addressing concept drift issues in website fingerprint recognition. |