Font Size: a A A

The Research And Application Of Text Categorization Arithmetic In Spam Filtering

Posted on:2007-11-04Degree:MasterType:Thesis
Country:ChinaCandidate:J S WangFull Text:PDF
GTID:2178360182496282Subject:Computer software and theory
Abstract/Summary:PDF Full Text Request
Electronic mail (e-mail) is becoming one of the fastest and most economical ways of communication available. At the same time, the growing problem of junk mail (also referred to as"spam") has generated a need for e-mail filtering. It has taken much bad affect on internet people and the corporations. It has also taken much economic losses and great social harm. Depending on general calculation, the junk mails take about 6,069,000,000 yuan losses in China every year. It also takes people much time on dealing with them. Facing with the more and more bad actuality, people start to find out resolvents on this problem. Many governments and organizations have established some legislations and conventions. But the situation has not been changed completely. So many people want to find out the resolvent from anti-spam filtering.Nowadays, anti-spam measures commonly include black or white list technology, manual rules and keyword based content filtering.Another approach is using automated text categorization and information filtering to filter spam. An e-mail filtering system can learn directly from a user's mail set. Such algorithms of text categorization as Naive Bayes, kNN, Decision Tree and SVM can be applied in spam filtering. However, the effectiveness of Naive Bayes is limited and it is not fit for instant feedback learning. Others algorithm are more effective but complicated to compute. In the analyzing of Naive Bayes, this article want to find out a high speed, handy calculating, good capability and convenient feedback method of anti-spam filtering.Presently, contented-based spam filtering consists of the rule-based and the statistic-based method. This article summarizes many researching methods on spam filtering and summarizes some technique in spam filtering.
Keywords/Search Tags:text categorization, spam filtering, Naive Bayes, EM-Naive Bayes, information filtering, feedback
PDF Full Text Request
Related items