Font Size: a A A

A Mixture Of Spam Filtering Technology Research

Posted on:2010-12-20Degree:MasterType:Thesis
Country:ChinaCandidate:J Y ZhouFull Text:PDF
GTID:2208360275483970Subject:Information and Communication Engineering
Abstract/Summary:PDF Full Text Request
Electronic mail is becoming one of the fastest and most economical ways of communication available. At the same time, the growing problem of junk mail has generated a need for e-mail filtering. Anti-spam measures, especially content filtering, have been hotspot attacking people all over the world.Na?ve Bayes text classification technique has a dominant place in the area of spam filtering for its excellent categorization, high precision. But in practical applications, the features of spam is varied constantly, the traditional Na?ve Bayes text classification technique can not adapt to this changes. So Na?ve Bayes text classification technique must combine with other techniques to filter changing spams. And HMM based text deobfuscation technique is a good self-learning and robust technique. Basic principle of Na?ve Bayes text classification technique and HMM based text deobfuscation technique are analyzed in this paper, then a spam filtering algorithm based on Na?ve Bayes text classification technique and HMM based text deobfuscation technique is designed. And a spam filtering system based on this algorithm is implemented.The contents are as following:(1) A summary about the spam filtering, including the definition, the harm, the current situation and the future trend in home and overseas. Basic principle of Bayes and Hidden Markov Model has been analyzed.(2) A hybrid spam filtering algorithm based on Na?ve Bayes text classification technique and HMM based text deobfuscation technique has been proposed.(3) This filtering algorithm is implemented with C/C++, especially the HMM based text deobfuscation technique.(4) A spam filtering system based on this hybrid algorithm has been designed. This system has been proved to be effective to filter spam in English and Chinese e-mail corpus.
Keywords/Search Tags:spam, spam filtering, Na?ve Bayes text classification, HMM based text deobfuscation
PDF Full Text Request
Related items