Font Size: a A A

Research And Implementation Of Active Defense Technology For Malicious Crawlers

Posted on:2020-12-26Degree:MasterType:Thesis
Country:ChinaCandidate:H B WuFull Text:PDF
GTID:2428330572472272Subject:Information security
Abstract/Summary:PDF Full Text Request
With the rapid development of information construction,the Internet has become an indispensable part of people's daily life.People can enjoy the convenience brought by the Internet without leaving home.However,the Internet is a double-edged sword,convenience and security can never be perfect,so a huge number of users and websites are exposed to danger due to security problems.There are all kinds of malicious websites,malware and Trojan horses on the Internet,which pose a huge threat to users'personal privacy and property security.They not only bring economic losses to users,but also endanger social and national security.The evolution of these network attacks is becoming more and more complex and automated.Because of the rapid spread of the Internet and the emergence of various types of malicious web pages,it is very difficult to detect them.This paper analyses the attack and detection technology of malicious web pages,and proposes a malicious web page detection method based on context information to solve the problem of insufficient text feature extraction in URL detection.A malicious web page detection system based on the combination of this detection method and static source code detection method is designed and implemented.The main work and achievements include the following aspects:(1)A malicious URL detection method based on context information is proposed to overcome the shortcomings of traditional text feature-based detection methods,which do not take into account the location of words and the lack of context information in URLs.This method can automatically extract text features,especially the relationship between words and words in the URL,which reduces manual intervention.(2)In malicious URL detection method based on context information,this paper analyses the differences between URL classification and text classification,studies the common attack methods and confusion methods of URL,participles and preprocesses the URL,and proposes an improved edit distance algorithm based on visual similarity between characters to calculate the similarity of domain names.Domain name similarity.Word2vec,an open source tool,is used to generate word vectors and construct a convolutional neural network for short text categorization such as URLs.According to the comparison of the experimental results,this detection method improves the accuracy,recall rate and false alarm rate of URL classification compared with the traditional bag of words model and support vector machine algorithm.Then,the source code detection of web pages based on machine learning algorithm makes up for the incomplete shortcoming of using only URL text feature classification to detect malicious web pages.Combining the advantages of the two detection technologies,a detection method is designed to ensure the detection rate in the case of low resource consumption.(3)Based on the above method,a malicious web page detection system is designed and implemented.The design and implementation of the main modules of the system are described.The detection capability and efficiency of the whole system are tested.
Keywords/Search Tags:malicious web pages, deep learning, text categorization, machine learning
PDF Full Text Request
Related items