Research And Implementation Of Active Defense Technology For Malicious Crawlers

Posted on:2020-12-26

Degree:Master

Type:Thesis

Country:China

Candidate:H B Wu

Full Text:PDF

GTID:2428330572472272

Subject:Information security

Abstract/Summary:

With the rapid development of information construction,the Internet has become an indispensable part of people's daily life.People can enjoy the convenience brought by the Internet without leaving home.However,the Internet is a double-edged sword,convenience and security can never be perfect,so a huge number of users and websites are exposed to danger due to security problems.There are all kinds of malicious websites,malware and Trojan horses on the Internet,which pose a huge threat to users'personal privacy and property security.They not only bring economic losses to users,but also endanger social and national security.The evolution of these network attacks is becoming more and more complex and automated.Because of the rapid spread of the Internet and the emergence of various types of malicious web pages,it is very difficult to detect them.This paper analyses the attack and detection technology of malicious web pages,and proposes a malicious web page detection method based on context information to solve the problem of insufficient text feature extraction in URL detection.A malicious web page detection system based on the combination of this detection method and static source code detection method is designed and implemented.The main work and achievements include the following aspects:(1)A malicious URL detection method based on context information is proposed to overcome the shortcomings of traditional text feature-based detection methods,which do not take into account the location of words and the lack of context information in URLs.This method can automatically extract text features,especially the relationship between words and words in the URL,which reduces manual intervention.(2)In malicious URL detection method based on context information,this paper analyses the differences between URL classification and text classification,studies the common attack methods and confusion methods of URL,participles and preprocesses the URL,and proposes an improved edit distance algorithm based on visual similarity between characters to calculate the similarity of domain names.Domain name similarity.Word2vec,an open source tool,is used to generate word vectors and construct a convolutional neural network for short text categorization such as URLs.According to the comparison of the experimental results,this detection method improves the accuracy,recall rate and false alarm rate of URL classification compared with the traditional bag of words model and support vector machine algorithm.Then,the source code detection of web pages based on machine learning algorithm makes up for the incomplete shortcoming of using only URL text feature classification to detect malicious web pages.Combining the advantages of the two detection technologies,a detection method is designed to ensure the detection rate in the case of low resource consumption.(3)Based on the above method,a malicious web page detection system is designed and implemented.The design and implementation of the main modules of the system are described.The detection capability and efficiency of the whole system are tested.

Keywords/Search Tags:

malicious web pages, deep learning, text categorization, machine learning

Related items

1	Text Categorization On Machine Learning Algorithm
2	Research On Text Categorization Technology Based On Deep Learning
3	Research On The Method Of Chinese Text Categorization Based On Machine Learning
4	Research On Malicious Web Page Recognition Based On Feature Fusion And Machine Learning
5	A Study On Text Categorization Based On Machine Learning
6	A Study On Optimization Of Pre-trained Chinese Word Embedding In Transfer Learning
7	Research On Chinese Text Categorization
8	Research On Malicious URL Detection Technology Based On Machine Learning
9	Research And Implementation Of Text Classification Based On Depth Learning Theory And SVM Technology
10	The Research And Application Of Text Categorization Based On Machine Learning