Font Size: a A A

The Analysis And Implementation Of Spam-Filtering System Based On Bayesian Algorithm

Posted on:2010-07-20Degree:MasterType:Thesis
Country:ChinaCandidate:Z Z CengFull Text:PDF
GTID:2178360278466002Subject:Computer Science and Technology
Abstract/Summary:PDF Full Text Request
As internet develops, which brings up great convenience to users, a new issue arises, which is massive spam that has done enormous damage to economy. Therefore, how to filter spams from emails has become a universal concern for email service providers and large amounts of email users, which is so-called "And of Spams".The filtering of spams is analogic to the categorization of texts to certain degree but it can't simply be drawn an equal line between them because the potential damage of mistaking legal emails for spams is greater than that of mistaking spams for legal ones. Therefore in this thesis, it mainly uses Bayesian Algorithm to filter spams and constructs Spamfilter Spam Filtering System based on it.In the opening, this thesis introduces the principles of the email system, common email transferring protocols and the formats of the email. Then it analyzes the spam-filtering system based on Bayesian Algorithm, which supports the email text with MIME format, uses quantities of categorized sets as the training sets for the spaceman of spams and legal emails, obtains the characteristic mode of various emails, proceeds with machine learning based on the obtained mode, filters emails which are categorized as spams and the legal. This system can be used as the plug-in for promail and can process the newly-received emails from client-end of email system.My responsibilities for the development of the system are as follows:(1) be familiar with the format of Email, including the format defined by RFC822 and the MIME format which supports binary data.(2) be familiar with the current status of the filtering technologies for junk mails.(3) understand the basic principles of Baye algorithm applied in filtering junk mails.(4) design and implement several modules of the system including gathering module, command analyzing module, mail processing module, sorting module and training module.And eventually, it is proven to be a comparably excellent filtering system through experiments.
Keywords/Search Tags:E-mail, Text Categorization, Spam, Bayesian Algorithm
PDF Full Text Request
Related items