Font Size: a A A

Parallel Filtering Model Of Spam And Research And Implementation Of The Arithmetic

Posted on:2008-04-01Degree:MasterType:Thesis
Country:ChinaCandidate:F P WangFull Text:PDF
GTID:2178360212485191Subject:Communication and Information System
Abstract/Summary:PDF Full Text Request
Electronic mail (e-mail) is becoming one of the fastest and most economical ways of communication available. At the same time, the growing problem of junk mail (also referred to as"spam") has generated a need for e-mail filtering. Nowadays, anti-spam measures commonly include black or white list technology, manual rules and keyword based content filtering.Another approach is using automated text categorization and information filtering to filter spam. An e-mail filtering system can learn directly from a user's mail set. Such algorithms of text categorization as Na?ve Bayes, kNN, Decision Tree and Boosting can be applied in spam filtering. However, the effectiveness of Na?ve Bayes is limited because of its assumption on arithmetic. Others algorithm are more effective but complicated to compute. Trying to resolve this problem, we propose using slipping window, just like the"pipeline"of computer theory, and we call it as parallel filtering. The experiment in public e-mail corpus shows an effective result.The contents of this article are as following: A summary about the state of the spam filtering. Introducing the normal approaches and technique of anti-spam, special about spam filtering technique.Analyzing the Na?ve Bayesian classifier. Bring slipping window to Na?ve Bayesian aiming at its limit of arithmetic, and get a parallel filtering model. Introducing the design and realization of the model, including the detail of some main modules.The experiment in public e-mail corpus and the e-mails of my own collection to test the performance of the model.
Keywords/Search Tags:spam filtering, text categorization, Na?ve Bayes, parallel filtering, slipping window
PDF Full Text Request
Related items