Font Size: a A A

On Technology Of Image-Based Spam Filtering

Posted on:2012-12-14Degree:MasterType:Thesis
Country:ChinaCandidate:W SongFull Text:PDF
GTID:2178330335480206Subject:Computer software and theory
Abstract/Summary:PDF Full Text Request
The article studies and summarizes the current technology of image-based spam filtering and the algorithm of image-based division. It also analyzes the performance of a variety of spam detection algorithms. According to the types of image-based spam, the article proposes two algorithms to locate the text range, and the final algorithm of two-rank-filtering-based image detection. The idea of combining the features of text range with key word matching is adopted in the latter. Two-rank filtering pattern is also designed. The algorithm could effectively improve the rate of image-based spam detection.The algorithm of edge-and-Morphology-based text range location is proposed to deal with text spam which has simplistic background. Firstly, edge detection is conducted through colored-image edge detection and several disturbing background spots are eliminated through threshold segmentation of grey-edge image. Secondly, the chosen text range is extracted with the technology of Morphology. Finally, the word component range is signed and the location of text range has been finished. The algorithm of wavelet-based text range is proposed to deal with complex spam which has both images and texts. This is a compound means of detecting text from complicated images. General detection and exact location are finished by using two-resolution wavelet change. Meanwhile, the relevant technology of Morphology is also used to achieve the exact location of text information.To improve the detection rate, the algorithm of two-rank-filtering-based image spam detection is proposed. There are two steps in the first rank. Firstly, features of text range which has clearer distinguishing degree are chosen and extracted. Secondly, it is judged whether it is image-based spam by the algorithm of SVM division. Thirdly, words in the word range of the image are analyzed and extracted. Finally, it is judged whether it is image-based spam by using the method of key word matching. After the first rank filtering, it is possible that some spam is mistakenly regarded as normal e-mails, so the second rank is needed. The purpose of the second rank is to analyze the results of the first rank filtering, thus to get correct division results. It has been proven by experiments that the rate of image-based spam detection could be effectively improved after the two-rank filtering. According to the results of experiments, the algorithms of text range location and image-based spam detection proposed in this article have great robusticity and accuracy.
Keywords/Search Tags:image spam, spam filtering, text range location, wavelet, two-rank filtering
PDF Full Text Request
Related items