Anonymous communication system prevents the leakage of user communication content and communication relationships through encryption and rerouting.However,while protecting the privacy of the public,the anonymous communication system actually provides a shelter for various cyber-crime.In order to combat cyber-crime anonymous traffic analysis techniques based on machine learning methods represented by traffic classification and website fingerprinting have been extensively studied.Currently,the network environment is complex and content is updated frequently,which makes the traffic analysis technology need to have both powerful feature learning capabilities and flexible classification model adjustment capabilities.In this thesis,the feature extraction and classification process are separated: based on the deep metric learning technology,the deep learning model is trained as a feature extractor,and the effective features required by the traffic analysis task are extracted under the interference of the complex network environment;the classical machine learning model is used as the classifier can realize the flexible adjustment of the classification model.Based on this architecture,this thesis conducts the following research.(1)Aiming at the cross-platform nature of web pages,a cross-platform website fingerprinting technology based on Multi-Similarity Loss is proposed,which uses Multi-Similarity Loss to control the deep learning model to extract the representative website fingerprint of anonymous traffic,and reduce the impact of platform differences on traffic features.As a result,attackers can use the traffic generated by one platform to train the classifier and use it to fingerprinting the traffic generated on all platforms The proposed method achieves a over 87% accuracy in cross-platform website fingerprinting scenario,which is significantly ahead of the existing solutions.In addition,it achieves over 96% accuracy in single-platform website fingerprinting scenario,which is similar to the performance of the best existing solution.(2)Due to the difficulty in extracting effective features and the high cost of updating the classification model in the research of webpage traffic identification,an anonymous webpage traffic identification technology based on Ranked List Loss is proposed.The traffic feature is extracted based on the sequence of packet directions to provide useful information for the deep learning model,and effective deep traffic feature is extracted by the deep learning feature extraction model under the control of Ranked List Loss.Compared with the existing machine learning based scheme,the accuracy is increased by 5% to 86.9%,the TPR reaches 89%,and the FPR drops to10.8%,and we have lower FPR compared with the existing deep learning based scheme.In addition,we apply data argumentation technology to anonymous webpage identification,which can effectively resist the impact of the uncertainty of the start and end positions of anonymous traffic on anonymous web traffic identification technology.(3)Combining the above two methods,according to actual implementation requirements,we designed and implemented a cross-platform website fingerprint recognition system with complete functions and convenient interaction.The system can configure the monitoring webpage from the user interface end and automatically complete the process of model training and traffic monitoring,and finally display the results on the user interface. |