Font Size: a A A

Research On The Recognition Of Phishing Websites Based On K-Means And SVM

Posted on:2017-03-19Degree:MasterType:Thesis
Country:ChinaCandidate:J L ZhaoFull Text:PDF
GTID:2308330485985185Subject:Computer Science and Technology
Abstract/Summary:PDF Full Text Request
Traditional blacklists based solely on URL identification method in front of today’s massive phishing site appeared to be inadequate, the other means of identification must be combined to work well. Followed by the emergence of many new identification methods, such as the recognition method based on web page structure, but actually this method quite unreliable and unusable due to the diversity of language expression. And based on image recognition of phishing sites recognition derived from a variety of identification methods, but subject to image recognition of defects, it needs to be done in identifying an exact match, then the requirements of phishing sites and sample sites to be imitated exactly similar. Therefore, a majority of fake shopping websites can not do a very good identification, and the work speed is very slow. Recognition results based on Naive Bayes is unstable due to the limitation of the principle of the work, so here we are exploring the use of K-Means and SVM combining algorithms take advantage of K-Means of identification prior to group a website, if the site is easy to be imitated by phishing sites, then use the characteristic parameters corresponding to the category to recognize it. Meanwhile combine traditional URL blacklist mechanisms and grayscale pages matching module, so you can avoid the newly created part of legitimate sites misidentified as phishing sites. And to complete this evaluation method, we have completed four workparts, namely:First, popular sample libraries, specialized collection of typical seasonal epidemic phishing sites, the same time obtain a valid web content and its word, remove stop words and other operations and analytical study of a series of typical characteristics of the site; the second is dividing the pages into groups, analyzing and using the resulting characteristics of the establishment of sample templates; Third, the use of SVM classification to obtain effective feature; Last, collect link features real-time when accessing the target destination site, and then use SVM algorithm to calculate its credibility. Through the work of these four elements of a complete process identified phishing sites.
Keywords/Search Tags:K-Means Algorithm, SVM, Phishing Website, Classification
PDF Full Text Request
Related items