Font Size: a A A

Classification Technique Based On The Method Mnnb The Mail

Posted on:2011-10-05Degree:MasterType:Thesis
Country:ChinaCandidate:W J HuFull Text:PDF
GTID:2208360308466624Subject:Information security
Abstract/Summary:PDF Full Text Request
With the development of the Internet, e-mail has become one of the most significant way for people to access and exchange information, and can be regarded as one of the killer applications of the Internet. Meanwhile, spam has become an increasingly serious global security issues on the Internet, and has drawn more and more attention from the community and researchers. Spam takes up limited storage space, computing and network resources, consumes a great deal of processing time, influences and interferes with the user's normal life, works and studies.This thesis provided a comprehensive and systematic study of the cutting-edge anti-spam techniques from a technical point of view, and has made the following innovations and achievements:1)Based on a complete survey of the prevalent spam filtering methods and techniques, we found that the content based spam classification methods has become the dominant anti-spam approach. Comparing with other non-contnet based spam classification systems, it is more applicable, more flexible and more in line with actual demand. So that it has drawn a lot of attentions from academic to industry, and gradually become the most popular research direction.2)Bayesian-type classifier plays an important role in anti-spam research area for its self-learning capability, self-adaptability and high classification accuracy. E-mail classification system based on Bayesian algorithm usually outperforms other alternative methods for the anti-spam tasks. This thesis provided a comprehensive study of the Bayesian classification technique and the Na?ve Bayes algorithm on the basic principles and application details.3)Based on above study, we proposed a novel content-based spam classification approach named MNNB, which makes use of the N-gram model together with the Mkarov chain properties to improve the independent hypotheses of the Na?ve bayes approach. Because the proposed method does not rely on a word-split mechanism, it is more adaptable to multi-lingual spam classification scenarios than traditional dictionary based approaches. For the purpose of simplifying the computational complexity, we further assume that the sentences contained in the corpus are independent to each other. Experimental results shows that our proposed method is superior to Na?ve bayes approach in term of classification accuracy, recall rates, false positive rates and false negative rates.
Keywords/Search Tags:machine learning, anti-spam, Na(?)ve bayes, N-gram, Markov chain
PDF Full Text Request
Related items