Font Size: a A A

Research On Image Spam Filtering System

Posted on:2016-07-18Degree:MasterType:Thesis
Country:ChinaCandidate:Y Y LiuFull Text:PDF
GTID:2298330467975359Subject:Electronics and Communications Engineering
Abstract/Summary:PDF Full Text Request
With the development of computer technology and the application of e-mail, Spammersoften use image spam to send advertisement, pornography, fraud and reactionary informationfor illegal interests. Spam usually can consume more network bandwidth resouse, wastestorage resources, even bring hidden danger to the social security regulation. However, theexisting filtering system is not perfect, so the research of image spam filtering technology isimportant and essential.This paper designs a cascade spam filtering system. The cascade classifier model reducesthe possibility of misclassification. The image spam filtering technologies mainly includefeature extraction and classification of e-mail images; therefore, the cascade filtering systemis designed according to the two aspects respectively.The first layer of rough classification, that allows most normal e-mail images areidentified, uses the low-level image features, and combines with SVM to realize the roughclassification. In this filtering system, after color feature, gradient feature and LBP feature arecompared and analysized, a new integration feature, i.e. gradient-LBP, is proposed which canachieve better classification accuracy based on SVM.The second layer of the detailed classification chooses the more fine image features, anduses SIFT features and GIST features to set up the bag of words model. The LSH algorithm isintroduced and improved. The second layer of the detailed filtering system realises. In thisfiltering system, the computational complexity and the accuracy rate of classification werecompared between LSH and refined LSH algorithms. A new text locating method wasproposed, whcih can be well positioned to the text in the image area. This method, using haarfeatures with the simple representation and fast calculation, combined with the Adaboostalgorithm.The third layer of the further classification uses text-recognition software to extract thetext area infomation in the spam image. The text information is compared with sensitive wordto achieve the final classification.The programming is executed based on MATLAB and VS2008, using Spam Archivedatabase which is public image library, the database which is collected from internet ande-mail by the author and man-made image dataset for training and testing. Finally, theperformance of each filtering system is analyzed and the experiment results show that cascadefiltering system obtain a high accuracy.
Keywords/Search Tags:spam, feature extraction, SVM, LSH, cascade classifier
PDF Full Text Request
Related items