Font Size: a A A

A TSVM-Based Approach For Detection Of Phishing Webpages

Posted on:2012-02-18Degree:MasterType:Thesis
Country:ChinaCandidate:L J ZhaoFull Text:PDF
GTID:2178330335953874Subject:Computer application technology
Abstract/Summary:PDF Full Text Request
Phishing is a criminal trick of stealing victims'personal sensitive information by luring users to visit a forged web page that designed to mimic a target page's own visual identity. With the number of phishing attacks is increasing more and more, as an anti-phishing solution, the anomaly-based phishing web detection has been attracted many attentions. This paper proposes a new phishing webpage detection approach based on a kind of semi-supervised learning method, transductive support vector machine (TSVM), which the features of web image and DOM objects/properties of web are taken in consideration, and the transductive support vector machine is used to detect and classify phishing web pages.Firstly, the features of web image are extracted for complementing the disadvantage of phishing detection only based on document object model (DOM). In order to obtain the better results of web image segmentation, the quantum evolutationary algorithm and clone operator introduce to improve the spectral clustering that used to divide the web image. Then according to the results of web image segmentation, the features of web image that include boundary shape, gray histogram, color histogram, and spatial relationship between subgraphs. According to DOM objects, the features of web sensitive information are examined, which include URL, form, SSL certificate and so on. Secondly, as the data formats that comprise web characteristic vector are different and exist redundant, the web classifier needs more time to handle characteristic data. In order to solve the problem, this paper introduced the efficiency Kernel Principal Component Analysis to normalize and reduce dimension of characteristic vectors, this make the data to fit for the phishing web classifier input requirements. These futures of web page are usually used to assess phishing page by the supervised classification-support vector machine (SVM). But SVM simply trains classifier by learning little and poor representative labeled samples, which they can not reflect the characteristics of unknown data. To avoid its drawbacks, we introduce the TSVM to train classifier that it takes into account the distribution information implicitly embodied in the large quantity of the unlabeled samples. Compared with phishing detection approach based on SVM classifier that employs DOM objects features, experimental results show that our method achieves better classification accuracy, and as an independent approach of phishing detection it has strong applicability.
Keywords/Search Tags:Phishing web detection, Spectral clustering, Features of web page, Kernel principal component analysis (KPCA), Transductive support vector machine (TSVM)
PDF Full Text Request
Related items