Font Size: a A A

Research And Implementation Of Chinese Spam Filtering Method Based On Data Mining

Posted on:2006-05-25Degree:MasterType:Thesis
Country:ChinaCandidate:J LingFull Text:PDF
GTID:2208360182956737Subject:Software engineering
Abstract/Summary:PDF Full Text Request
With wide application of e-mail, spam, acting as the carrier of business advertisements, the malicious programs or some sensitive mails, are more and more fiercely threatening the safety of the computer systems and the lives of people. Anti-spam problem has become an international, significant and practical topic now.There are two major methods on automated filtering mail: based on rule and based on probability. Compared with the other text classifiers, Naive Bayesian arithmetic has more widely been used in the area of text classification for the simply method can classify texts correctly and more quickly. Mistaking the legitimate mail as spam will produce more loss than mistaking the spam as legitimate. However, the conditional Naive Bayesian method doesn't consider the different features between the legitimate mail and the spam in the process of classifying and filtering mail and don't take into account the loss of misclassifying legitimate mail as spam, there are some limitations on filtering mail. The paper designs an system structure for spam filter which based on data mining. And provides the effective way to implement this spam filter system.This paper studies all kinds of technologies of spam filter system, which include Naive Bayesian arithmetic, the improved arithmetic of Naive Bayesian, Chinese text segmentation technology, automated text classification technology, and so on. With applications of the improved arithmetic of Naive Bayesian email filtering, compare with the results of two filtering experiments.
Keywords/Search Tags:Spam, data mining, Naive Bayesian arithmetic, E-mail filter, automatic text segmentation, feature extraction
PDF Full Text Request
Related items