Font Size: a A A

Image Spam Filter

Posted on:2014-01-25Degree:MasterType:Thesis
Country:ChinaCandidate:Y X HouFull Text:PDF
GTID:2248330398472065Subject:Signal and Information Processing
Abstract/Summary:PDF Full Text Request
Image spam has become a new obfuscating method to bypass conventional text based spam filters. In order to evade the filtering of spam filters, spammers often use noise, volatility and distorting as multiple interference form to fight against the spam filter. With the explosive growth of the Image spam, academia and industry put forward some image spam filtering technology. Industrial spam filtering system requires very low false alarm rate, because the loss of regular mail may cause great losses to the user. This paper analyzes the characteristics of image-based spam and the methods of image spam filtering.Firstly, this paper provides an overview on image spam filtering technologies, mainly including the definition, types and characteristics of the image spam. In order to ensure the efficient and precise filter results, this paper designs a hierarchical image spam filtering system which consists of two parts:the front-end filtration system and back-end analysis system.Secondly, in this paper, a new kind of image spam filtering method is proposed based on the characteristics of the spam being sent repeatedly and their contents being highly resemble to each other. We extract sub-block color histogram and ORB as image feature and run a scalable vocabulary tree to detect image spam. The system is tested on Mark Dredze’s dataset and our own Chinese image spam corpus. Experimental results demonstrate that the proposed method can achieve good accuracy while having a less than0.01%false positive rate.Thirdly, this dissertation proposes a text detection part of our back-end analysis system. Analyzing the existing text detection technologies, combined with the characteristics of Chinese characters, we proposes a text area detection algorithm for image spam by means of corner detection and edge feature. The experimental results show that our algorithm has a good performance in image spam filtering.
Keywords/Search Tags:Image Spam, near-duplicate detection, TextDetection, ORB, Scalable Vocabulary Tree
PDF Full Text Request
Related items