Font Size: a A A

Research On Phishing Webpage Recognition Based On Deep Learning

Posted on:2019-10-02Degree:MasterType:Thesis
Country:ChinaCandidate:T Z NanFull Text:PDF
GTID:2428330566991418Subject:Computer application technology
Abstract/Summary:PDF Full Text Request
With the popularization of networks and computers,the problem of network security has also arisen.The identification of phishing webpages is an urgent problem to be solved in network security.At present,there are usually four methods for identifying popular phishing webpages:blacklist methods,heuristic methods,image recognition,and machine learning methods.Each has obvious disadvantages.The blacklist method,heuristic method,image recognition method has high leak rate because they can not obtain the characteristics of new phishing webpage in time.Machine learning methods are usually shallow learning and generalize complex classification problem is weak,so it is high false positive rate.According to the study that deep learning can effectively solve the above problems.After the comparison of various deep learning model frameworks,the paper adopted the automatic encoder model as a model framework for identifying phishing webpages.The autoencoder is a simple three layer network model,which consist of encode layer,hidden layer,decode layer,after the features are encoded and decoded,a more substantial expression can be obtained.This paper first analyzes the url of the webpage and the source code of the webpage,and divides the features into URL text features,DNS features,WHOIS features,ranking features,and page content features.By extracting the features in each category to form a 52-dimensional feature vector,then filling missing features.And then using automatic encoder as the framework of the model and use constructed vectors as input data.At present,when using the deep learning model,the adjustment of parameters mainly includes three methods:manual method,grid search,and random search.Each method has many shortcomings.This paper proposes an adaptive hidden layer node number optimization algorithm based on node weight correlation,by introducing correlation coefficient theory to automatically adjust the number of hidden layer nodes,the current layer structure is optimized.In order to prove the correct of the algorithm,this paper analyzes six kinds of performances,namely,accuracy rate,recall rate,false positive rate,false negative rate,true rate,and true negative rate,finally,the effectiveness of the algorithm is proved.After,this paper use the ensemble learning for the classification result of the automatic encoder and adopt a modified weighted voting algorithm apply to missing feature.The accuracy is improved.Finally,we use optimal structure of the automatic encoder tocompare with the support vector machine algorithm and naive bayes algorithm,and results prove the effectiveness of the automatic encoder.Afterwards,three normalized improvements are applied to the input feature vectors respectively further improving the recognition performance.
Keywords/Search Tags:Phishing web, Deep Learning, Autoencoder, Recognition, Feature, Correlation coefficient
PDF Full Text Request
Related items