Internet users do not always behave friendly when they utilize privacy protection technologies such as Tor to hide web browsing traces.Website fingerprint(WF)attack,as a recognition method of website traffic based on fingerprint feature,can be used to help network regulators monitor or prevent malicious users when internet users browse illegal websites or perform illegal operations for sensitive websites.However,WF attacks face some issues and challenges.For example,how to improve the attack accuracy against powerful anonymity Tor network,how to choose the suitable website fingerprint features to achieve higher benefits with fewer overheads,how to reduce time overheads to deploy attacks in the real-world environment,how to address the intervention of a large number of unmonitored websites,how to handle concept drift problem directly,and the feasibility of performing WF attacks for defense measures.This paper proposes a scalable website traffic recognition framework,self-regulating learning of student,to perform WF attack with a small number of labeled traffic traces against Tor network.This framework reduces the attack overhead,addresses intervention of a large number of unmonitored websites,decreases accuracy loss caused by concept drift,and verifies itself generalization for defense measures.Specifically,this paper mainly makes the following works and contributions:(1)Verifying the feasibility of packet direction as website fingerprint.We first build a deep convolution neural networks(DCNN)to achieve a high and stable WF attack accuracy with a small number of labeled datasets.We utilize SMOTE technology to generate different scale labeled datasets for small datasets,and perform WF attack by using DCNN to prove that packet direction as website fingerprint is more feasibility.(2)Proposing a new semi-supervised WF attack framework,persistent attack of student(PAS)framework.We analyze the time and memory overhead by using PAS to handle that how to decrease the attack overhead.The PAS framework uses selftraining mechanism to alleviate concept drift.The PAS framework with DCNN achieves 96.50%-98.88% accuracy in closed world,and its time overhead is 0.7-0.8x than advanced deep learning WF attacks.It reaches 96.32% precision in open world of40,000 unmonitored websites.In addition,the best accuracy loss of PAS is 8.07% in concept drift of 56 days,which is 2.27% better than advanced deep learning WF attacks.(3)Presenting a novel scalable semi-supervised framework,self-regulating learning of student(SRLS).We analyze the WF attack flow for SRLS,and SRLS integrates two innovative mechanisms into PAS,including self-regulating learning and automatic confidence threshold generation mechanism.SRLS reduces the probability of concept drift by decreasing the overhead of collecting dataset,filtering out invalid data,and marking data.SRLS learns new fingerprint features from unlabeled traffic traces.SRLS learns the relationship of the new and old fingerprint features from unlabeled traces by using self-regulating learning mechanism to alleviate concept drift and improve attack accuracy.It employs automatic confidence threshold generation mechanism to achieve fine-grain multi-classification analysis in open world,and to complete the task of automatic WF attack.Finally,we verify that SRLS can address defense measures.SRLS integrates all these advantages into itself for implementing automatic and scalable WF attack.SRLS performs WF attack by using 50-100 labeled traces per websites,and achieve 98% accuracy.When it handles 3-56 days concept drift,the SRLS accuracy only drops 0-15.53%,which is 3.11%-9.70% better than most advanced deep learning WF attacks.Automatic confidence threshold generation mechanism requires 428 seconds to filter and mark 400,000 unmonitored websites,and achieves over 86% precision in open world.The result shows that the scalable SRLS framework has some advantages,including automatic WF attack flow,easy deployment,the mitigation of concept drift issue,and the strong generalization ability. |