Font Size: a A A

Rough Set Based Spam Filtering

Posted on:2007-10-27Degree:MasterType:Thesis
Country:ChinaCandidate:C L ChenFull Text:PDF
GTID:2178360185459243Subject:Computer application technology
Abstract/Summary:PDF Full Text Request
Electronic mail (e-mail) is one of the most popular services of the Internet. E-mail has brought us great convenience in our daily work and life. At the same time,It has brought us an annoying byproduct-Spam(also referred to as "junk mail").Because we have devoted ourselves into the task of anti-spam by the way of technologies and law, the spam has been in our control in some degree.Nowadays, anti-spam measures commonly include black or white list technology, manual rules and keyword based content filtering. Many email filtering systems such as Bayes filtering system, are not very ideal in spam filtering effect. The main problem in these filtering systems is the possibility of discriminating non-spam to spam is high, which causes users would not use email filtering system. Rough set based spam filtering is one of the rule based content filtering methods. Applying rough set to spam filtering domain is a new research task, it can reduce error rate of classifying a non-spam to spam.With the above observations in mind, the work of this thesis is as follows:1. Discussing what is spam and its harm;2. Presenting typical anti-spam techniques and discussing the fundamental principles of email classification and their classification accuracy;3. Present rough set based spam filtering model and its work flow, and on this base, improve the system model.4. Research on feature selection problem of rough set based spam filtering system , present a new feature selection method, which combines both Mitra's and Sequential Forward Selection and improve system classification accuracy.5. Taking experiments for several feature selection algorithm with Weka (a machine learning software based on Java), evaluate the experiment results and validate the proposed algorithm.
Keywords/Search Tags:Spam, Rough Set, Email classification, Feature Selection
PDF Full Text Request
Related items