Font Size: a A A

Research On Phishing Webpages Detection Based On Machine Learning

Posted on:2019-09-29Degree:MasterType:Thesis
Country:ChinaCandidate:Y DingFull Text:PDF
GTID:2428330566967001Subject:Computer application technology
Abstract/Summary:PDF Full Text Request
This article takes the phishing detection as the main research line,and deeply studies the detection feature category of phishing webpages,webpage filtering technology,and phishing webpage classification technology based on machine learning.By analyzing the collected webpage data and detailed experiments,this paper validates the webpage filtering technology and the phishing webpage detection method proposed in this paper,and basically meets the expected results.The main innovations in this article include the following three points:(1)We propose a legitimate webpages filtering method based on search engines.The phishing websites studied in this article refer to fake websites that imitate legitimate webpages and steal user privacy data.Therefore,the keywords of the phishing webpage itself are often not associated with its domain tag.Different from the traditional detection method,which directly determines the nature of the webpage based on the results of the search engine feedback,this article only uses the results of the search engine to determine whether it is a legitimate webpage and does not determine it as another type of webpage.The experimental results show that the legal web filtering method proposed in this paper can accurately filter 62% of legitimate web pages with an error rate of only 0.01%,which improves the real-time performance of the phishing detection.(2)We propose a phishing webpages filtering method based on heuristic rule matching.For URL obfuscation techniques such as violating URL naming conventions,hiding phishing attack target words,and adding junk characters,the nature of phishing web pages can be quickly determined based on string pattern matching.In this paper,a heuristic rule base is built for commonly used URL obfuscation technology,which can quickly filter some types of phishing web pages,thereby omitting some other feature extraction processes of phishing web pages to meet the needs of real-time detection.The experimental results show that the phishing web filtering method presented in this paper can accurately filter 28% of phishing webpages with an error rate of only 0.09%,which improves the real-time performance of the phishing detection method.(3)We propose the SHLR phishing detection method.The SHLR fishing detection method consists of three parts,including the legitimate webpages filtering method based on search engines,the phishing webpages filtering method based on heuristic rule matching,and the phishing classifier based on logistic regression.The three complement each other to ensure the effect of the phishing detection,meet the real-time detection requirements,and improve the adaptability of the method.Meanwhile,this paper proposes the detection feature of DNS doubt,and introduces Jaccard similarity between strings to detect randomly generated phishing URL technology.The experimental results show that compared with the other four phishing web detection methods,the accuracy of SHLR detection method has improved by 0-3.2%,the recall rate has increased by 0.2%-3.4%,and the F1-Score value has increased by 0.1%-3.3%.The single URL decision time is shortened by 4.54-7.84?s.At the same time,SHLR also has better adaptability.
Keywords/Search Tags:Machine learning, phishing, URL obfuscation technology, Webpage filtering
PDF Full Text Request
Related items