Font Size: a A A

Research Of Content-Based Spam Filter Technology

Posted on:2008-11-18Degree:MasterType:Thesis
Country:ChinaCandidate:J JiaFull Text:PDF
GTID:2178360215464586Subject:Computer application technology
Abstract/Summary:PDF Full Text Request
With the rapid popularization of the internet, email has become one of the major communication means currently. But much attention is also aroused by the emergence and flood of spam. Spam filtering is hot and a variety of anti-spam technology has emerged. Content-based spam filter is usually the core of whole filtering system, with high performance and accuracy demand.Existing content-based spam filtering methods are discussed and the theory of email and transferring email is introduced. With the deep research on rough set attribute discretization, reduction and support vector machine classification, a spam filtering solution based on rough set and support vector machine (RS-SVM) is proposed. After email content segmentation, feature selection and feature weight calculation, the emails can be expressed by a vector space. Then rough set discretization is used to discretize the attributes' values in vector space and rough set reduction is used to reduce the dimensions of vector space in order to minimize the number of features. Finally, the reduced vector space is utilized as input for SVM classifier training to generate the filter.The feasibility of the solution is indicated by the experiments on the public email corpus in the paper. Comparison experiments are also made between SVM classifier with and without rough set reduction. The result shows that the solution proposed not only keeps the classification accuracy, but also improves the classification rate. That means the solution can increase filtering speed of new emails. We apply it to the client side spam filtering. The class chart, flow chart, interface and the function modules of the prototype spam filtering system are given in the end.
Keywords/Search Tags:Spam Filter, Rough Set, Support Vector Machine, Attribute Reduction
PDF Full Text Request
Related items