Font Size: a A A

Research On Content-Based Spam Filtering

Posted on:2008-10-09Degree:MasterType:Thesis
Country:ChinaCandidate:Z X YinFull Text:PDF
GTID:2178360215969515Subject:Computer application technology
Abstract/Summary:PDF Full Text Request
With the rapid development of Internet, the technology of email is widely used in people's daily life. However, the occurrence of more and more seam emails is annoying to user, which causes the great waste of users'time, money as well as network bandwidth. And what's the worst, it can be harmful to users. For example, pornographic content may be contained in spam email. Therefore, it is very important to resolve the problem of spam emails.Nowadays, Black-list or White-list technology , rule-based filtering and keyword-based content filtering are the most common anti-spam approaches. Another approach is using automated text categorization and information filtering to filter spam. Some algorithms of text categorization, such as Naive Bayes,kNN and Decision Tree can be applied to filter spam. Compared with other text classifiers, Naive Bayes algorithm has been widely used in the area of text classification because of the simplicity, efficiency and veracity. However, it will cost a lot if the filter misclassify legitimate mail as junk in the process of filtering junk mail. So we must take some action to prevent it. Based on the systematic summarization of the most recent work on anti-spam, this dissertation explores the techniques of anti-spam email.Through the author's effort, some innovations and achievements are made by the author,which will be illustrated in detail as follows.1) Analysed the background of the spam, including the reason, harm and state of the spam filtering, investigated anti-spam problem from the text categorization perspective.2) Analysed the Naive Bayesian state, described the basic steps of Naive Bayesian algorithms and build a model of it, compared the infections of feature number, threshold and the variations of corpus in Ling-spam, realized program of Naive Bayesian algorithms.3) Designed the system of anti-spam filter on the client side, realized email's sending and receiving and spam filtering, it is valid in practice.
Keywords/Search Tags:spam email, filtering techniques, Na(?)ve Bayes, text categorization, feature extraction
PDF Full Text Request
Related items