Font Size: a A A

Realization Of The Image Spam Filtering System

Posted on:2011-04-24Degree:MasterType:Thesis
Country:ChinaCandidate:J X LiuFull Text:PDF
GTID:2178360302974599Subject:Computer applications
Abstract/Summary:PDF Full Text Request
As one of the efficient modern communication tools, Email has become one of the means of indispensable communication in daily life, but the more and more spam email has become one of the most serious problems. So automatic spam filtering technology has been attracted the people who research on machine learning, text classification, information filtering and other related fields, identifying spam by text content features is the main approach and has been gaining a competitive edge against text-based email spam. But spammers has embed spam message into images and send them to circumvent text-based anti-spam filters, and costs more storage resources and bandwidth resources, which is called "image spam". We need to research on the image spam.In this paper, we first provide an overview on the image spam, including the characteristics of image spam and the difficulties of detecting image spam. We also provides an overview on the state of mode in image spam, including SVM,na(?)ve bayes,logistic,knn.Secondly, we introduce the extraction of text-based features and image features. On the hand of text-based features, compare with normal image, spam image usually contains more than text and in order to prevent the text filter contain content obscuring; on the other hand of image-based features, spam image usually is a synthetic, the characteristics of color saturation is different from normal image.Third, we propose a hybrid image spam filtering framework to detect spam images based on both extracted text and image features, which can combine the strengths of the two kind features. Our experiment results show significant improvements in accuracy compared to classifiers simply using text or image features, also the proposed approach works robustly in spite of complex backgrounds and compression artifacts.
Keywords/Search Tags:Spam filtering, image spam, image features, text features
PDF Full Text Request
Related items