Font Size: a A A

A Phishing Website Detection Method Based On Stacking Model

Posted on:2020-06-21Degree:MasterType:Thesis
Country:ChinaCandidate:Y K LiFull Text:PDF
GTID:2428330596995390Subject:Control engineering
Abstract/Summary:PDF Full Text Request
With the gradual improvement of hardware facilities and the widespread use of the Internet,surfing the Internet and browsing the web has become an increasingly important habit in people's daily lives.By spoofing the login page of a well-known website,the criminals trick the user into logging in to obtain the user's private information.This behavior is called phishing attack.In recent years,the number of phishing attacks has shown an alarming growth trend,and there have been many changes in the form of attacks.Phishing attacks are characterized by deceptiveness,high pertinence,and short timeliness.These characteristics make it difficult for people who have not received phishing knowledge to artificially identify phishing websites.In the field of phishing website detection,the most widely used method in the industry is the black and white list along with rules.However,phishing webpages generally have a short lifetime,which makes it costly to maintain a large and time-sensitive black and white list database;in addition,artificially rules are easily bypassed by phishers.In recent years,the most widely studied method in academia is to detect phishing webpages through machine learning.This type of method has the advantages of high accuracy and robustness.However,a large amount of data is required in order to train the machine learning models.At present,the public data set on the phishing website is very rare;in addition,the system design that is too complicated is relatively slow and cannot be used in real time environment.In order to cope with the above problems,our thesis proposes a phishing webpage detection system based on stacking using multi-source features.1.In terms of data sets,this article collects a real world data set containing URLs,HTML,and page screenshots of 53,103 web pages,named 50K-IPD.2.In terms of multi-source features,the three main source features of the webpage's URL,HTML source code,and browser-rendered page screenshots are used;among them,the URL and HTML features are lightweight and do not depend on any third-party services.This makes it possible to develop a real-time phishing webpage detection system.3.In terms of model,a stacking model is designed.The model combines three machine learning algorithms,GBDT,XGBoost and LightGBM,and has a multi-layer structure,which enables different algorithms to complement each other and improve the performance of the phishing webpage detection system.On 50K-IPD dataset,the proposed approach achieves 98.60% on accuracy,1.28% on missing alarm rate,and 1.54% on false alarm rate.The model is optimal in comparison with other machine learning algorithms and methods proposed by peers.The experiment proves that the method proposed in this paper is feasible in the detection of phishing webpages.4.Further,after identifying the phishing webpage,we propose a method of identifying the phishing target of the phishing webpage.We collected a screenshot dataset containing 9,013 webpages,named 9K-PCD.According to the different objects spoofed by the phishing webpage,there are 113 classes,and each class is a phishing target that has no less than 10 samples.In terms of method,this paper uses the deep convolutional neural network?CNN?to train the classification model,and finally achieves the performance of 92.31% on accuracy and 93.66% on F1 value in identifying phishing target.
Keywords/Search Tags:phishing website detection, machine learning, stacking model, phishing target recognition, classification algorithm
PDF Full Text Request
Related items