Font Size: a A A

Research On Spam-filtering Method Based On Visual Features Analysis

Posted on:2008-09-05Degree:MasterType:Thesis
Country:ChinaCandidate:G B GuFull Text:PDF
GTID:2178360272969302Subject:Computer application technology
Abstract/Summary:PDF Full Text Request
With the widely use of email, the growth of spams has been more and more fastly. Reported by vnunet.com, the total number of spams in Internet will be doubled in the following months. The safety of the computer system has been seriously threatean and People's daily work will be more and more discommodious. Spam has aroused widespread concern in the industry, and the anti-spam problem has become an international, hot and practical topic now.The email filter is one of the key technologies of anti-spam. Nowadays, the main spam filter techniques have four ways. The first way is based on IP address. The second way is based on the envelope or the head of the email. The third way is based on the contents. And the final way is based on the action of the email receiver. These methods have played a certain role in spam filtering. Howerver, if spammers use images to mask deceptive or junk messages, current filtering methods will make no sense to them.After analyzed a large collection of spam emails containing images, the paper proposed an anti-spam method based on visual features analysis, which extracts useful visual features to filter spam and can effectively avoid the trick of masking junk information in spam. Simultaneously, construct the system stucture and give the detailed design of the modules and work flow, with the extraction method of visual features.To enhance the performace of the system, this paper studied all kinds of techniques of spam filter system, which include MIME decoding technology, text location technology in images and the classification algorithm. With analysis of current text location technology, this paper uses sliding window positioning, the searching algorithm of text region and post-processing technologies to improve the location accuracy. Meanwhile, after compared the merits of the current classification algorithm, a one-class support vector machines will be used as spam classification algorithm.Finally, implementing the method, the prototype of a spam-filtering tool was constructed based on Visual C++ 6.0 and Libsvm-2.82. And process spam filtering simulation on the 163 mailbox by using POP3 protocol. From the points of classification performance, filtration capability and filtering time consumption, the experimental results demonstrate that the proposed method can get satisfactory filtering effect with higher detection rate and lower false positives.
Keywords/Search Tags:Spam, Visual features, MIME decoding, Text location in images, One-class Support Vector Machine
PDF Full Text Request
Related items