Font Size: a A A

Rearch On Content-Based Spam Filtering Technology

Posted on:2007-03-14Degree:MasterType:Thesis
Country:ChinaCandidate:P WangFull Text:PDF
GTID:2178360185967857Subject:Computer application technology
Abstract/Summary:PDF Full Text Request
With the rapid development of the Internet, the online electronic information is booming, and electronic mail become the fastest and most economical form of communication available. Unfortunately, a lot of junk mails(also referred to as "spam") are popular at the same time. The junk mails not only fill up mail server storage space, but also make user spend much time on removing these junk mails. As a result, it is significant to explore an automated mail filter.Nowadays ,Black-list or White-list technology , rule-based filtering and keyword-based content filtering are the most common anti-spam approaches . Another approach is using automated text categorization and information filtering to filter spam. Some algorithms of text categorization, such as Naive Bayes, kNN and Decision Tree can be applied to filter spam. Compared with other text classifiers, Naive Bayes algorithm has been widely used in the area of text classification because of the simplicity, efficiency and veracity. However, it will cost a lot if the filter misclassify legitimate mail as junk in the process of filtering junk mail. So we must take some action to prevent it. The contents of this article are as following:1. Introduced the background of the spam, including the definition, history ,harm of spam.2. Summarized the state of the spam filtering.3. Investigating anti-spam problem from the text categorization perspective, introducing the approaches of feature selection, classifiers and e-mail corpus in this task.4. Analyzed the Naive Bayesian method detailed, including the state of Naive Bayesian , two models of the Bayesian ,some advices of improving the Naive Bayesian filter. Compared the infections of feature number, threshold and the variations of corpus in Ling-spam.5. In the end, summarized many kinds of technologies and designed a...
Keywords/Search Tags:spam filtering, text categorization, naive Bayes, feature extraction
PDF Full Text Request
Related items