Font Size: a A A

Research On Tor Website Fingerprinting Based On Machine Learning

Posted on:2021-02-24Degree:MasterType:Thesis
Country:ChinaCandidate:X K MengFull Text:PDF
GTID:2518306047486774Subject:Cyberspace security
Abstract/Summary:PDF Full Text Request
The purpose of an encrypted communication network is to hide the relationship and content between the two parties in communication.Once the two parties have established an encrypted communication network,the communication content will be encrypted,and routing information such as source and destination IPs of third parties will be hidden.When user accesses the webpage through an encrypted communication,the multiple request and response traffic generated is encrypted webpage traffic.The website fingerprinting is used to identify these encrypted web traffic,which can get the information of the web pages visited by users without breaking the encrypted data of users.Implement the censorship of users' online contentWhen performing website fingerprinting,the most important step is to ensure that the identified traffic is encrypted single-page.The existing website fingerprinting model are based on single web traffic.If the input encrypted traffic to be detected is mixed traffic of multiple webpages,the website cannot be accurately identified,which will affect the application of website fingerprinting in actual attack scenarios.Therefore,it is very important to identify the initial data packet of each encrypted web traffic,that is,the encrypted web page traffic split point,and obtain the encrypted web page traffic of a single web page for website fingerprinting.In response to these challenges,we propose a anonymous Tor website fingerprinting based on machine learning.The scheme proposed in this paper has two models: web traffic split point recognition and website fingerprinting recognition.The web traffic split point recognition model divides the data packets into sequences based on time granularity.Then use time-series features to construct and extract features for each sequence.Use machine learning to identify the first sequence in each set of webpage sequences as the split point of webpage traffic.Since the split points and the number of non-split point unbalanced ratio,we perform data imbalance experiments on the dataset and propose a solution to the imbalance of the split point dataset.Besides that,we also evaluate the feature calculation efficiency and recognition accuracy at different time granularities.And the shows that our method has better recognition effect comparison with existing work.The website fingerprinting is based on the characteristics of the distribution of the number of data packets at different stages of the web transmission by the Tor browser.Feature extraction is performed by accumulating part of the packet length then use support vector machine algorithm to classify the extracted features to identify different websites.The number of different data packets will affect the efficiency and accuracy of website fingerprinting.Therefore,we experimentally evaluate the number of intercepted packets and choose the optimal number of intercepted packets.After comparison with existing work,our recognition method has the same ability to accurately recognize the effect.Finally,we designed the Tor website fingerprint recognition model under the actual traffic environment.Through the actual collection of traffic data sets website fingerprinting,to verify the effectiveness of the program in the anonymous Tor website fingerprinting review.
Keywords/Search Tags:Tor anonymous network, Website fingerprinting, Machine learning, Web traffic split point identification, Packet timing, Packet interception
PDF Full Text Request
Related items