Font Size: a A A

The Application Of The Classification Method In Identifying The Website Phishing Data

Posted on:2019-03-31Degree:MasterType:Thesis
Country:ChinaCandidate:C J SuFull Text:PDF
GTID:2348330569989341Subject:Applied statistics
Abstract/Summary:PDF Full Text Request
Classification is an important data analysis technology in data mining,it's application range is very wide and covers many areas.Classification has been widely researched in artificial intelligence,machine learning and pattern recognition,it has brought out many classification methods so far.The data sets of this paper from UCI,data sets come from www.phishtank.com,which is a free social site.Users can submit,verificate,track and share all kinds of fishing on this website.Through the feature's difference of phishing sites and legitimate websites,1353 data samples were collected,which including 548 legitimate websites,702 phishing sites and 103 suspicious website.The data set consists of 10 columns,1-9 columns represent the site features(variables),and the 10th column is the class label.We analyzed the data through single factor analysis and variable correlation analysis preliminarily.From the analysis we found that all single variable can't classify the data independently and the dependence between variables is not strong.After preliminary analysis we used K-nearest neighbor,Random forests,Support vector machine(SVM),Naive bayes and BP neural network methods to classify the website phishing data.The classification results indicated that the BP neural network method is the best method for this data sets because of the smallest probability of misclassification.
Keywords/Search Tags:Website Phishing, Mutual Information, Data Mining, Algorithm, Classification
PDF Full Text Request
Related items