Font Size: a A A

Research On Malicious URL Detection Technology Based On Machine Learning

Posted on:2022-12-08Degree:MasterType:Thesis
Country:ChinaCandidate:J C ZhuFull Text:PDF
GTID:2518306764479224Subject:Automation Technology
Abstract/Summary:PDF Full Text Request
The development of the Internet has made people's lives more and more convenient,but at the same time,it has also provided attackers with an efficient platform for committing crimes.In order to achieve ulterior motives,cyber criminals search for various security loopholes in the network and use them,causing huge losses to the society.At present,more and more criminals seek benefits through malware promotion,silently spreading viruses,and defrauding account information.The operation methods of such malicious attacks often only need to modify the URL of the website slightly to get through.This low-cost method of cybercrime has caused serious damage to the Internet ecosystem,so researchers have put a lot of effort into the technology of detecting malicious URL.In the current machine learning technology,features are mainly derived from manual extraction by experienced security experts.Over time,more and more abundant features are proposed and applied to machine learning.But at present,many researchers focus their efforts on the character features of URLs,ignoring that URLs come from the changing network.On the other hand,when using deep neural networks to classify URLs,in order to meet the same input size of the fully connected layer,some researchers adopt a singlelayer CNN and select 1-max pooling in the pooling stage,so that the dimension of features is equivalent to the number of feature maps.Although this method solves the problem that the length of input data is not uniform,the subsequent convolution operation cannot be performed.Some researchers set the length of all URL characters as S uniformly by means of expansion or truncation.When the length is less than S,it is filled with 0,and when it exceeds S,it is truncated.Obviously,this method has obvious problems about information loss and noise.Based on the above considerations,the research content and innovation of this thesis are reflected in the following three aspects:(1)Aiming at the problem of poor model detection results caused by incomplete feature dimensions,a malicious URL detection model based on URL network activity features is proposed.This thesis expands the feature categories of URL detection based on the existing research.Integrate network activity data such as HTTP request features,URL access features,filing information features,IP binding features,and TTL values into existing features.The effectiveness of network activity features is verified by machine learning algorithms such as Naive Bayes,Support Vector Machine,Logistic Regression,and Random Forest.(2)Aiming at the problem of information loss and noise caused by the inconsistent length of URL characters in the data processing stage,a dynamic convolutional neural network is proposed,and a dynamic multi-layer neural network model based on charword embedding is proposed on this basis.By incorporating a pooling parameter adaptive strategy into the hidden layer of the neural network,the size of the pooling window is dynamically adjusted according to the length of the input characters and the depth of the model,which simplifies the data processing process and achieves a comprehensive learning of the local correlation and forward and reverse time series characteristics of the samples.Experiments show that the dynamic multi-layer neural network exhibits better performance than before.(3)The DCBLA-RF model is designed and implemented.In order to use the URL network activity feature proposed in this thesis and the URL word-formation feature extracted by the improved dynamic multi-layer neural network model for malicious URL detection,two fusion model frameworks are designed based on weighted voting mechanism and feature normalization.According to the framework,multiple fusion models are implemented,and the detection results of each model are analyzed through multiple sets of experiments.Finally,the advantages of the DCBLA-RF model in malicious URL detection are determined based on experiments and theory.
Keywords/Search Tags:Malicious URL, Network activity feature, Machine Learning, Dynamic Networks, Deep Learning
PDF Full Text Request
Related items