Font Size: a A A

Research On Phishing URL Detection Technology Based On Deep Learning

Posted on:2022-05-06Degree:MasterType:Thesis
Country:ChinaCandidate:Q ZhangFull Text:PDF
GTID:2518306326494874Subject:Master of Engineering
Abstract/Summary:PDF Full Text Request
With the continuous expansion of network scale and the continuous development of various network applications,Internet has become an indispensable infrastructure in human life.At the same time,various cyber attacks have become increasingly rampant,causing huge hidden dangers to cyberspace security.Among them,phishing refers to a user fraud on the Internet.Attackers use phishing webpages to trick users into entering their accounts,passwords and other information to steal users' private information and property,causing great losses to network users.Therefore,how to detect phishing webpages accurately and efficiently is a research hotspot in network security.So far,domestic and foreign scholars have proposed many different types of detection methods.Among them,detecting phishing webpages by automatically extracting URL features based on deep learning technology does not need to obtain web content or manually extract features,which is an efficient and accurate detection method.However,the existing phishing URL detection methods based on deep learning still have the following problems: 1)the commonly used URL segmentation methods make the sensitive words lose effective information,or cannot obtain the word embedding vector of the new words,or cannot obtain the contact information between the special characters and the characters before and after them;2)The detection model used in the current detection method is not comprehensive enough to extract the characteristics of URL data.For example,the spatial local characteristics and sequence characteristics of the URL data are not comprehensively considered,or the longdistance,non-contiguous words in the URL data are not considered 3)The detection model currently used is mostly a static model,which cannot effectively learn the constantly changing data features,which leads to a gradual decrease in the accuracy of the model and insufficient stability;4)The current detection methods based on deep learning technology do not consider the robustness of the detection model itself,while AI model itself is vulnerable to attacks from adversarial samples.The attacker generates adversarial samples by adding carefully constructed perturbations to URL sample instances to reduce the accuracy of the detection model,or even make it invalid.This paper,based on the National Natural Science Fund project,conducts research on the above problems.The research contents and innovation points of this paper are as follows:1)Aiming at the problem of information loss in URL segmentation,a method based on sensitive word segmentation is proposed.This method first classifies the URL according to the special characters,and treats the special characters as words to obtain the effective information of the special characters.Then classify the non-sensitive words by character level,and distinguish the sensitive words as a whole from the rest of the characters,so that the key information in the URL can be marked clearly,which is conducive to the neural network classifier to extract more representative features.2)Aiming at the problem that the detection model is not comprehensive enough to extract the features of URL data,this paper first proposes a phishing URL detection method based on CNN-Bi LSTM.This method comprehensively utilizes the optimization of convolution network and bidirectional long short memory network,automatically obtains the spatial local features of data through CNN,and automatically obtains the temporal features of data through bilstm,which effectively improves the detection accuracy and accuracy Recall rate and F1 value.Then,in view of the fact that the existing detection model can not obtain the long-distance and discontinuous word dependency features of URL,a phishing URL detection method based on MPAN is proposed.This method constructs the URL as a bidirectional acyclic graph,and obtains the interaction information between words through MPAN.The experiment shows that this method can effectively improve the detection ability of phishing URL.3)Aiming at the problems of low stability and low robustness of existing detection models,a multi-classifier phishing URL detection method based on mimic architecture is proposed.This method uses the mimic architecture proposed by Academician Wu as the basic framework,and combines two deep learning models of CNN-Bi LSTM and MPAN to detect phishing URLs.Through the dynamically schedulable and reconfigurable heterogeneous redundant classifiers,features are automatically extracted from the original data for data detection,and then the detection results of multiple classifiers are summarized and judged to generate the final detection results to improve detection Robustness of the method.Through the incremental update method of the classifier,the classifier can learn the characteristics of the new data to improve the stability of the detection method.
Keywords/Search Tags:Deep learning, phishing URL, URL segmentation, feature extraction, mimicry architecture
PDF Full Text Request
Related items