Font Size: a A A

Research And Implementation Of The Anti-spam System Based On Bayesian Algorithm

Posted on:2010-10-14Degree:MasterType:Thesis
Country:ChinaCandidate:W LinFull Text:PDF
GTID:2178360275499895Subject:Computer application technology
Abstract/Summary:PDF Full Text Request
With the development and popularization of Internet,E-mail has been widely applied. It also bring convenience to the people but have a new problem—there is a lot of spam. Spam is increasingly affecting people's daily life,so research spam filtering is very great significance.This thesis first outlines the e-mail and filter technology,introduces the principle of Bayesian and appllys it to spam filtering. In order to extract E-mail's features effectively and improve spam filtering performance in spam filtering system. This thesis introduces segmentation algorithm and language model based on N-Gram. Then an improved N-Gram segmentation algorithm is proposed and an improved Bayesian filtering model based on N-Gram model is given. The experimental results show that the improved approach is effective to spam filtering.Feature selection which is an very important process for Content-Based spam filtering,It is very efficient to improve the efficiency and precision of filtration. In this thesis we analyze and improve disadvantages of IG and CHI which are applied to spam filtering. But these evaluation functions only evaluate the relationship between feature and class,they neglect the relationship among features. So we applly clustering to reduce redundancy which is measured by mutual information among features.At last we design and implemente an anti-spam system based on bayesian algorithm in this thesis.
Keywords/Search Tags:Spam filtering, Na(?)ve bayesian model, N-Gram, Feature selection
PDF Full Text Request
Related items