Research On Phishing Webpage Recognition Based On Deep Learning

Posted on:2019-10-02

Degree:Master

Type:Thesis

Country:China

Candidate:T Z Nan

Full Text:PDF

GTID:2428330566991418

Subject:Computer application technology

Abstract/Summary:

PDF Full Text Request

With the popularization of networks and computers,the problem of network security has also arisen.The identification of phishing webpages is an urgent problem to be solved in network security.At present,there are usually four methods for identifying popular phishing webpages:blacklist methods,heuristic methods,image recognition,and machine learning methods.Each has obvious disadvantages.The blacklist method,heuristic method,image recognition method has high leak rate because they can not obtain the characteristics of new phishing webpage in time.Machine learning methods are usually shallow learning and generalize complex classification problem is weak,so it is high false positive rate.According to the study that deep learning can effectively solve the above problems.After the comparison of various deep learning model frameworks,the paper adopted the automatic encoder model as a model framework for identifying phishing webpages.The autoencoder is a simple three layer network model,which consist of encode layer,hidden layer,decode layer,after the features are encoded and decoded,a more substantial expression can be obtained.This paper first analyzes the url of the webpage and the source code of the webpage,and divides the features into URL text features,DNS features,WHOIS features,ranking features,and page content features.By extracting the features in each category to form a 52-dimensional feature vector,then filling missing features.And then using automatic encoder as the framework of the model and use constructed vectors as input data.At present,when using the deep learning model,the adjustment of parameters mainly includes three methods:manual method,grid search,and random search.Each method has many shortcomings.This paper proposes an adaptive hidden layer node number optimization algorithm based on node weight correlation,by introducing correlation coefficient theory to automatically adjust the number of hidden layer nodes,the current layer structure is optimized.In order to prove the correct of the algorithm,this paper analyzes six kinds of performances,namely,accuracy rate,recall rate,false positive rate,false negative rate,true rate,and true negative rate,finally,the effectiveness of the algorithm is proved.After,this paper use the ensemble learning for the classification result of the automatic encoder and adopt a modified weighted voting algorithm apply to missing feature.The accuracy is improved.Finally,we use optimal structure of the automatic encoder tocompare with the support vector machine algorithm and naive bayes algorithm,and results prove the effectiveness of the automatic encoder.Afterwards,three normalized improvements are applied to the input feature vectors respectively further improving the recognition performance.

Keywords/Search Tags:

Phishing web, Deep Learning, Autoencoder, Recognition, Feature, Correlation coefficient

PDF Full Text Request

Related items

1	Research On Phishing URL Detection Technology Based On Deep Learning
2	Research And Application On Feature Learning Method Based On Deep Autoencoder Neural Network
3	A Research Of Phishing Detection Technology Based On Deep Learning
4	Research And Application On Autoencoder Based Feature Learning Model Of Neural Network
5	Research On Detection Method Of Phishing Web Page Based On Deep Learning
6	Research On Feature Extraction Of Face Image Based On Gabor-LDA And K-Algorithm Of Correlation Coefficient
7	Optimizing Deep Learning Algorithm Based On Noisy Autoencoder
8	The Application Of Deep Learning In Handwritten Numeral Recognition
9	Research On Stack Hybrid Autoencoder And Transfer Learning In Facial Expression Recognition
10	Facial Expression Recognition Via Deep Learning