Font Size: a A A

Research On Tor Hidden Service Subject Classification

Posted on:2022-08-10Degree:MasterType:Thesis
Country:ChinaCandidate:M L XuFull Text:PDF
GTID:2518306605470144Subject:Master of Engineering
Abstract/Summary:PDF Full Text Request
With the development of network technology and the increasing number of users,the Internet world has become increasingly large.At present,the network world is generally divided into "surface network" and "deep network".The surface network can be retrieved by public search engines,and "deep network" is the opposite."Dark network" is usually considered as a subset of "deep network",which is composed of Tor,I2 P,Zero Net and other anonymous networks,among which Tor network(The Second Generation Onion Routing Network)is the most widely used.Tor hidden service is a service that can only be accessed through Tor network(such as website).It is the main carrier of all kinds of information in Tor network and even dark network.However,because of its strong secrecy,it is now widely used by criminals in crimes,which is full of a large number of sensitive information such as guns,drugs,credit card transactions.Therefore,it is of great significance to study the subject classification of Tor hidden services for preventing and tracing cyber crimes and endangering national security.In order to maintain its invisibility,Tor hidden service only publishes domain name information in a small range,and the domain name is updated frequently.The connection between each hidden service is sparse,and even there are some isolated areas.This leads to the problems of difficult collection of Tor hidden service domain names and low efficiency of service content topic identification.Therefore,this thesis focuses on the automatic data collection and topic classification of Tor hidden services(1)Topic classification of Tor hidden service based on traditional machine learning.Firstly,the contents of Tor hidden services are collected as datasets by deploying overseas Tor directory server and crawler server.Then,the traditional machine learning algorithm is used to design the topic classification model.Aiming at the problem that the web page structure is not considered in the study of Tor hidden service classification,different weights are set for the title and text content of hidden service web page.Based on the original feature weight algorithm TF IDF,the thesis uses four machine learning algorithms to train and classify the model.The improved TF is used to train and classify the web pages-IDF algorithm improves classification effect.(2)The topic classification of Tor hidden service based on BERT model.In view of the complex feature engineering and the difficulty of dealing with large-scale Tor hidden service content,this thesis studies the topic classification method based on BERT model.The experimental results show that the best SVM algorithm in the classification effect of BERT model is 4.2% higher than that of traditional machine algorithm,which provides the basis for further study on Tor hidden service classification.(3)The prototype system design of Tor hidden service topic classification.Based on the above research,combined with the actual project,this thesis designs and implements the topic classification system of Tor hidden service content.Because the classification models of traditional machine learning algorithm and deep learning algorithm have their own advantages and disadvantages,the system integrates two classification models at the same time,and selects different models according to the needs of application scenarios.The topic classification system can realize the functions of collecting,processing and classifying the Tor hidden service data,and can provide help for the research field of Tor hidden services.
Keywords/Search Tags:Tor, hidden services, machine learning, text classification
PDF Full Text Request
Related items