Font Size: a A A

Research On A Method For Phishing Webpage Detection Based On DOM Structure Clustering

Posted on:2020-01-08Degree:MasterType:Thesis
Country:ChinaCandidate:Y ZhangFull Text:PDF
GTID:2428330590959394Subject:Computer application technology
Abstract/Summary:PDF Full Text Request
The frequent occurrence of phishing attacks threatens the security and stability of social platforms,phishing detection has become an important research task to maintain cyberspace security.With the upgrade of phishing attack technology,the content features extracted by traditional detection methods are no longer suitable for new webpages,and there are some shortcomings of computational complexity.Therefore,based on the summary of existing research.the webpage type discrimination is regarded as the problem of comparison and clustering between webpages,the phishing webpage is detected by a clustering method based on the DOM(Document Object Model)structure.The main tasks are as follows.(1)For the problem of high complexity and low accuracy of similarity calculation in webpage feature analysis,the structural information is used to construct webpage feature vector,and an improved TCDC(Tag Class Difference Calculation)algorithm is proposed.Webpage similarity is measured by a composite score of the difference between the tag vector and the style attribute vector.This method makes up for the shortcomings of ignoring the order and importance of webpage tags in the traditional method.At the same time,the clustering algorithm of DSC(DOM Structure Clustering)is proposed.The initial center set selection problem is solved by the ICPS(Initial Center Point Selection)algorithm,and the training webpages are iteratively divided by the optimized similarity of webpage to obtain the clustering results.The categorization of unknown webpage is done by comparing structural similarities with cluster centers,and finally the unknown webpage type is determined by the cluster label.The experimental results show that the similarity calculated by the algorithm is more accurate,and the detection has higher TPR(True Positive Rate)and lower FPR(False Positive Rate)value.(2)Applying the compression algorithm to webpage fingerprint generation process for the time-consuming problem in webpage comparison which speeds up the webpage comparison.The FG(Fingerprint Generation)algorithm based on improved compression coding can obtain its compressed representation while retaining the feature order.In the first stage,the compression algorithm is used to obtain the webpage tag coding sequence,and the shallow coding information is selected as the initial fingerprint.In the second stage,the repeated encoding is subjected to secondary compression conversion,and the final coding sequence is used as the fingerprint of webpage.After the fingerprint is generated,the fingerprint comparison is performed by using FC(Fingerprint Comparison)algorithm.The experimental results show that the TPR and FPR results of the proposed fingerprint generation algorithm are better than the classical fingerprint generation algorithm.Compared with the previous direct vector comparison method,the fingerprint generation algorithm reduces the time of webpage vector comparison and improves the classification speed of the webpage to be tested.
Keywords/Search Tags:Phishing Webpage, DOM Structure, Clustering, Webpage Fingerprint, Compression Coding
PDF Full Text Request
Related items