Font Size: a A A

The Study On Algorithm Of The Minimum Risk Bayes Spam Filtering

Posted on:2011-06-04Degree:MasterType:Thesis
Country:ChinaCandidate:H L WuFull Text:PDF
GTID:2198330332471736Subject:Software engineering
Abstract/Summary:PDF Full Text Request
Electronic mail is becoming the most important communication way among the modern people, with the network and communication technology becoming advanced. But spam brings inconvenience to our lives and extremely bad impact to the security of the network. Solving the problem of spare is urgent.There have been a lot of methods to beat spam, and the approach of using automated text categorization and information filtering to filter spam is become a most efficient one. We analyzed the current technology of content-based spam filtering, and found lots of differences between the traditional text categorization Problem and the one of spam filtering. Depend on these analysis, we develop some methods to improve the performance of the spam filtering algorithm.Firstly, the thesis investigates thoroughly considerable anti-spam documents and data from both home and abroad. Furthermore, analysis and conclusion are made on existing anti-spam techniques. The E-mail filter technology is an important measure against spam, which at present is mainly based on IP address, rules and the content respectively.Secondly, the minimum risk Bayes model can reduce the risk to judge the normal mail as spam email. We study the minimum risk Bayes algorithm in details and propose the improving in four aspects. The first aspect is the showing of text. We proposes a new method which is fingerprint feature. The second aspect is feature selecting. We propose a new method which is class condition distribute. The e-mail corpus and text corpus are very different in structure. We analyzed the structure of email, and purposed an email header and email body integrated model. In the fourth aspect, we propose threshold adjusting algorithm. In the end, we combine four aspects, and realize the improving Bayes percolator.Lastly, we design a percolator that base on the minimum risk Bayes mail filtering algorithm. When compared with the Bogofilter, we found the filtering effect is greatly improved.
Keywords/Search Tags:Spam, Bayes algorithm, Minimum risk Bayes, Feature selection, Fingerprint feature
PDF Full Text Request
Related items