Font Size: a A A

Research On Content-Based Spam Filtering Technology

Posted on:2012-12-08Degree:MasterType:Thesis
Country:ChinaCandidate:Z YanFull Text:PDF
GTID:2218330338972828Subject:Safety Technology and Engineering
Abstract/Summary:PDF Full Text Request
Because of convenient and efficient, E-mail has become a important tool of obtaining information and rapid communication. However, the emergence of more and more spam caused great loss, lots of spam used up our network resources, and posed a threat to the network security. So, it's a very significative task for us to research spam filtering technology. At present, text classification technology is introduced into spam filtering technology, and content-based filtering technology is becoming hotspot. This paper introduces kinds of the current spam filtering technology and analyzes the disadvantages of some algorithms which have been implemented on spam filtering field. Aiming to these disadvantages, Improvement was proposed. Feature selection as a kernel part of spam filtering techniques, effect on accuracy and efficiency of spam filtering. After analysis and comparison of these well-known feature selection algorithms, in the paper, Information Gain is implemented. Aiming at this serious redundancy problems among various selected features, a feature selection method based on IG and rough set theory is presented, and an attribute reduction algorithm based on attribute union theory is proposed. Experiment and analysis prove that the method is effective to eliminate redundancy, the feature susbsets which are more representative can be acquired, and the accuracy and efficiency of spam filtering are improved. As we know, Precision and efficiency of spam filtering depends on the stand or fall of classifier, after analysis and comparison of several well-known text classification algorithm, Bayesian method is implemented. Aiming to the advantages and disadvantages between Naive Bayes method and Bayesian network method, double levels Bayesian network classification algorithm is proposed. At the same time, In order to reduce the damage of wrong E-mail classifying, double levels Bayesian network classification algorithm based on risk minimization decision is proposed. Finally, we design a spam filtering prototype based on Bayesian model with related improvement. Figure 20 table 7 reference 43...
Keywords/Search Tags:spam, bayesian mothed, rough set, feature selection
PDF Full Text Request
Related items