Font Size: a A A

Detecting malicious Webpages using content based classification

Posted on:2012-10-01Degree:M.SType:Thesis
University:University of California, San DiegoCandidate:Bannur, Sushma NageshFull Text:PDF
GTID:2458390008991621Subject:Web Studies
Abstract/Summary:
In this thesis, we propose a supervised learning approach to detect malicious Webpages. We use features from the URL, textual content, structural tags, page links and visual appearance of a Webpage. First, we demonstrate an offline classification model using batch learning algorithms and evaluate the benefit of including each of the feature types. Then we illustrate an online classification model using online learning algorithms on a larger dataset. For both these models, we use a live feed of labeled data collected from a large webmail provider. For a base rate of 66%, we achieve 98% accuracy using a combination of URL and Webpage content features for classification. We observe that incorporating Webpage content features in addition to the URL features, reduces the error rate by about 50%.
Keywords/Search Tags:Webpage, Content, URL, Features, Using, Classification
Related items