Font Size: a A A

Research On Content-Based Spam Filtering

Posted on:2005-09-21Degree:MasterType:Thesis
Country:ChinaCandidate:W F PanFull Text:PDF
GTID:2178360185495543Subject:Computer software and theory
Abstract/Summary:PDF Full Text Request
Electronic mail (e-mail) is becoming one of the fastest and most economical ways of communication available. At the same time, the growing problem of junk mail (also referred to as"spam") has generated a need for e-mail filtering. Nowadays, anti-spam measures commonly include black or white list technology, manual rules and keyword based content filtering.Another approach is using automated text categorization and information filtering to filter spam. An e-mail filtering system can learn directly from a user's mail set. Such algorithms of text categorization as Na?ve Bayes, kNN, Decision Tree and Boosting can be applied in spam filtering. However, the effectiveness of Na?ve Bayes is limited and it is not fit for instant feedback learning. Others algorithm are more effective but complicated to compute. Trying to resolve this problem, we propose using Winnow, a fast linear classifier. The training of Winnow is online and mistake driven. Furthermore, Winnow is suitable for feedback. The experiment in public e-mail corpus shows an effective result.The contents of this article are as following:1) A summary about the state of the spam filtering.2) Investigating anti-spam problem from the text categorization perspective, introducing the approaches of feature selection, classfiers and e-mail corpus in this task.3) We consider methods for learning Na(I|¨)ve Bayesian classifier and compare the infections of feature number, threshold and the variations of corpus in PU1.4) The Winnow algorithm is proved to be effective to filter spam in PU1 and Ling-Spam e-mail corpus.5) Taking examples for Na(I|¨)ve Bayes and Winnow, We discuss the feedback learning in anti-spam task.6) In the end, a frame of a spam filtering system is designed.
Keywords/Search Tags:spam filtering, text categorization, Na(I|¨)ve Bayes, Winnow, feedback, information filtering
PDF Full Text Request
Related items