Classification Technique Based On The Method Mnnb The Mail

Posted on:2011-10-05

Degree:Master

Type:Thesis

Country:China

Candidate:W J Hu

Full Text:PDF

GTID:2208360308466624

Subject:Information security

Abstract/Summary:

PDF Full Text Request

With the development of the Internet, e-mail has become one of the most significant way for people to access and exchange information, and can be regarded as one of the killer applications of the Internet. Meanwhile, spam has become an increasingly serious global security issues on the Internet, and has drawn more and more attention from the community and researchers. Spam takes up limited storage space, computing and network resources, consumes a great deal of processing time, influences and interferes with the user's normal life, works and studies.This thesis provided a comprehensive and systematic study of the cutting-edge anti-spam techniques from a technical point of view, and has made the following innovations and achievements:1)Based on a complete survey of the prevalent spam filtering methods and techniques, we found that the content based spam classification methods has become the dominant anti-spam approach. Comparing with other non-contnet based spam classification systems, it is more applicable, more flexible and more in line with actual demand. So that it has drawn a lot of attentions from academic to industry, and gradually become the most popular research direction.2)Bayesian-type classifier plays an important role in anti-spam research area for its self-learning capability, self-adaptability and high classification accuracy. E-mail classification system based on Bayesian algorithm usually outperforms other alternative methods for the anti-spam tasks. This thesis provided a comprehensive study of the Bayesian classification technique and the Na?ve Bayes algorithm on the basic principles and application details.3)Based on above study, we proposed a novel content-based spam classification approach named MNNB, which makes use of the N-gram model together with the Mkarov chain properties to improve the independent hypotheses of the Na?ve bayes approach. Because the proposed method does not rely on a word-split mechanism, it is more adaptable to multi-lingual spam classification scenarios than traditional dictionary based approaches. For the purpose of simplifying the computational complexity, we further assume that the sentences contained in the corpus are independent to each other. Experimental results shows that our proposed method is superior to Na?ve bayes approach in term of classification accuracy, recall rates, false positive rates and false negative rates.

Keywords/Search Tags:

machine learning, anti-spam, Na(?)ve bayes, N-gram, Markov chain

PDF Full Text Request

Related items

1	Application Of Data Mining In Email Anti-Spam System
2	Content-based Anti-Spam Filtering
3	Design And Implementation Of The Email Spam Detection System Based On Naive Bayes And Svm
4	Design And Implementation Of The Email Spam Detection System Based On Naive Bayes And SVM
5	Design And Realize Of Anti-Spam Arithmetic
6	PAC-Bayes Bound Theory And Experimental Research On SVM Algorithm
7	Research And Implementation Of Distributed Anti-Spam System Based On Bayesian
8	SVM-Based Novel Method Of Online Spam Filtering
9	Data Mining's Application And Research In The Field Of Anti-Spam
10	Application Of Spam Filtering Technology Based On Genetic Algorithm And Bayes Model