Font Size: a A A

Research On Spam Filtering Technology

Posted on:2008-12-19Degree:MasterType:Thesis
Country:ChinaCandidate:Y H SunFull Text:PDF
GTID:2178360212981447Subject:Computer application technology
Abstract/Summary:PDF Full Text Request
E-mail is becoming a major means of communication in our daily work and life, but the flooding of spam has made a great impact on its application, therefore it is a very important task to distinguish spam from legitimate mails.Many researches on spam filtering have been carried out in recent years. Some measures of spam filtering, such as black or white list, manual rules and so on have been widely used. But these measures have certain limitations. Nowadays, with the development of machine learning, text categorization and information filtering, the analysis of e-mail content is becoming a hot research topic in spam filtering.Based upon a deep investigation on large number of latest spam samples, the spammers' common spurious methods are summarized. Through the reference to large amount of anti-span documents and data from home and abroad, an analysis is made on existing anti-spam techniques and in particular the content-based spam filtering methods. For the extensively used Na(?)ve Bayes algorithm in spam filtering field, an improved method is used to spam filtering which adopt N-gram theory . Also in this paper, a research on mail signature is made and a structure-based 2-layers filtering model is designed. Experiments show that the improved method applying to the 2-layers filtering model presents better performance in spam categorization and filtering. The spam misclassification rate and ham misclassification rate are reduced a lot. In the end, a frame of a spam filtering system is designed.
Keywords/Search Tags:spam filtering, text categorization, Na(?)ve Bayes, signature
PDF Full Text Request
Related items