Font Size: a A A

The Research And Implementation Of Malicious Web Pages Detection From Search Engine Based On Decision Tree

Posted on:2014-04-19Degree:MasterType:Thesis
Country:ChinaCandidate:H ZhouFull Text:PDF
GTID:2268330425983782Subject:Information and Communication Engineering
Abstract/Summary:PDF Full Text Request
With the rapid development of the Internet, the network information increasesin an explosive way, the search engine which has the function of resource integrationhas become the primary way people access to information. However, a large numberof phishing web pages and malicious links give users a great security risk. How tokeep users from accessing the malicious searching link has profound and realisticsignificance. However, existing defensive tool for search engine has limit s. Thispaper is to improve the coverage of detecting web pages rang for search engine,taking the advantage of machine learning which has the ability of dealing similarthings, and makes the detection system more intelligent.In order to correctly and quickly determine the security of web pages for searchengine, divide web pages into the benign or malicious kind, prediction rule isobtained from classification model in machine learning.At first, through the analysis of a large number of malicious and normal webpages, a variety of new features are selected to detect malicious web pages,including the PageRank value and the number of search results from Google,Alexatraffic information, domain information and WOT reputation value, etc. Comparedwith previous selected features used for detecting malicious web page, our featuresobtained are more robust and authority, which can better classify malicious webpages from normal ones.Secondly, using several extraction techniques obtain the selected features,utilizing classification algorithms in machine learning such as Naive Bayes,SVM(support vector machine),k-Nearest Neighbor, decision tree algorithm etc generateclassification model from webpage feature set. After weighted superposing for theJ48decision tree model which has the advantage of high classification performanceand low complexity, the classification accuracy reached95.19%, it can effectivelyevaluate the security of web pages, and is suitable for fast clas sification of searchengine web pages.At last,the function of Chrome browser was extended, and the decision treemodel through machine learning is applied to detecting the search engine web pages.When the browser extension detect a search engine query, for each search engineweb page, asynchronous XMLHttpRequest was used to extract the features in classification model and the detecting result will be timely presented near to thepage using several icons. Through a plenty of searching tests among severalpopularly search engine,the results show that the extension developed is accurateand effective for the malicious web pages detection from any search engine.
Keywords/Search Tags:search engine, malicious webpage, machine learning, classificationmodel, decision tree, browser extension
PDF Full Text Request
Related items