Phishing Website Recognition Based On Model Fusion

Posted on:2024-09-24

Degree:Master

Type:Thesis

Country:China

Candidate:J L Hu

Full Text:PDF

GTID:2558307058480724

Subject:Applied Statistics

Abstract/Summary:

PDF Full Text Request

With the increasing frequency of web-based user information communication,there are also serious challenges associated with the security of the web.Phishing attacks are one of them,which send fake links through various means to lure users to log in,then steal their private information and eventually lead to the users’ privacy disclosure and property loss.Therefore,in order to create a safe network environment and avoid the loss of Internet users’ property,it is particularly important to establish an effective phishing site identification model to monitor suspicious websites in time.This thesis attempt to establish phishing website identification model based on the dataset published by the Kaggle website in 2021 on detecting and identifying phishing websites.Firstly,the research background,significance and research status at home and abroad are analyzed.The relevant algorithm theories used in the article are introduced.Then the descriptive statistical analysis of feature variables are carried out.The differences between the legitimate websites and phishing websites in construction website URL structure,page content and external query service are analyzed.Secondly,data preprocessing and feature engineering are carried out,outliers are eliminated,the variance filtering method and RF-RFE algorithm are used to screen the features,and 17 redundant feature variables are eliminated.Then,the single classifiers are constructed,and the models with better prediction effect on single-model training are chosen as the base models for subsequent use.Finally,the fusion model is constructed,the dataset is divided into a training set and test set at a ratio of 7: 3,XGBoost,Light GBM and Random Forest,which have better prediction effect in single model,are selected to construct the traditional Stacking model.Given the poor performance of the model,the idea of Stacking ensemble model is used to improve the construction of the first layer model.The data set is divided into three parts according to different sources,and the same kind of base classifiers are fused according to different source data to construct the XGBoost-Stacking model and Light GBM-Stacking model,and the evaluation indicators of the fusion model are compared and analyzed.The results show that the Light GBM-Stacking model has the best prediction effect.On this basis,the Bayesian optimization method is used to globally optimize the parameters of the model,which further improves the prediction effect of the fusion model.Compared with the improved Stacking phishing website recognition model existing in the literature,the Light GBM-Stacking model optimized by Bayes has a relative increase of 1.45% in recall rate,a relative decrease of 32.71%in FNR,and a relative increase of 2.43% in AUC value.The prediction effect is better and the model is robust.

Keywords/Search Tags:

Phishing Website Identification, Machine Learning, LightGBM-Stacking Fusion Model, Bayesian Optimization

PDF Full Text Request

Related items

1	A Phishing Website Detection Method Based On Stacking Model
2	Research On Phishing Detection Based On The Link Features Of Website
3	Research On Stacking Fusion Model Recom-mendation Algorithm Based On Implicit Fee-back Feature ——Take Music And Reading Platforms As Examples
4	Design And Implementation Of Phishing Website Detection System Based On Hybrid Feature Selection Framework
5	Research On Phishing Website Identification Based On Intelligent Algorithm
6	Construction And Application Of Internet Financial Loan Default Prediction Model Based On Stacking Fusion Algorithm
7	Phishing Detection Technology Based On URL And Web Page Features
8	Research On Phishing Website Detection Based On Data Mining Classification Algorithm
9	Research On The Target Identification Of Phishing Based On URL
10	Research On Enterprise Credit Assessment Based On Model Fusion