Font Size: a A A

The Design Of Cascade Type Image Spam Filtering System

Posted on:2018-12-03Degree:MasterType:Thesis
Country:ChinaCandidate:L H DaiFull Text:PDF
GTID:2428330542976977Subject:Electronic and communication engineering
Abstract/Summary:PDF Full Text Request
With the popularity of the Internet,more and more e-mail has turned from the past single text mode into the current mixed graphics mode.The traditional spam filtering method can not distinguish and extract the images and text information effectively,many studies have been carried out on these,some achievements have been made,but have not achieved desired results.At present,the method of image spam filtering has the following disadvantages:the method based on the image text information and the traditional spam filtering is greatly affected by the image resolution and other interference factors,and the execution efficiency is not high.Because of the lack of correlation between the metadata information of image file and the content of the image,the image in the mail is not differentiated,so the false alarm rate is higher.Filtering method based on the characteristics of the image itself is classified by means of extracting image feature and combined with machine learning algorithm and other algorithms,although it has certain real-time and stability,the filtration performance of single image feature of different types is still not ideal.In order to solve the above problems,based on SIFT algorithm and convolution neural network,this paper proposes the research and design of the image spam filtering system,the main work is as follows:Based on SIFT algorithm to extract Image feature,K-MEANS algorithm is used to construct the bag of words model and form frequency histogram.The neural network is trained by using the CIFAR-10 data set,after converging and then the network is trained by using the SPAM ARCHIVE standard image library.And the network layer is replaced with the final classifier fully connected layer,a feature extractor based on convolutional neural network is formed,the output of the connection layer is the characteristics of convolutional neural network extraction.The frequency histogram and feature extraction based on convolution neural network are obtained by linear combining,"SIFT-CNN fusion feature" is got.Compared with the traditional SIFT feature,the SIFT-CNN fusion feature has a higher computational complexity,but it has a better ability to express the image.Based on "SIFT-CNN fusion feature",the images by using the SVM algorithm;are classified.The classification results are tested by using the standard image database,after several testing,the kernel function and penalty parameters with a high classification accuracy are determined.Wavelet transform is used to image binaryzation,and use OCR to extract text information,improved KNN algorithm then used to compare the word text with the sensitive words text.The spam is further subdivided into advertising,illegal and other.Compared with the general KNN algorithm,the improved KNN algorithm can not only guarantee the accuracy of the algorithm,but also improve the efficiency.In the mixed programming environment of MATLAB2014A and VS2013,the SPAMARCHIVE standard image library is used to train and test the system,by continuous optimization of the system performance,the cascaded filtering system get a higher classification accuracy and faster classification speed.The filtering system designed in this paper can distinguish image spam accurately and effectively,which provides a reference for the research and design of image spam filtering system.The "SIFT-CNN fusion feature proposed in this paper can provide reference for the future research of garbage image features.
Keywords/Search Tags:Image Spam, Fusion Feature, Support Vector Machine, Bag of Words, Optical Character Recognition
PDF Full Text Request
Related items