Font Size: a A A

The Design Of Spam Filtering System Based On Neural Network Ensemble

Posted on:2011-03-11Degree:MasterType:Thesis
Country:ChinaCandidate:B P LiuFull Text:PDF
GTID:2178360308476107Subject:Computer application technology
Abstract/Summary:PDF Full Text Request
It makes the e-mail have become an important means of information exchange to population and application of the network. However, problem of spam seriously affect people's production and life. The research of the spam filtering technology is of great significance. There are many inadequacies in the existing spam filtering technologies currently; the spam can not be completely filtered out. In order to achieve a complete filtering spam ideal situation, it needs to study a more effective spam filtering technology so as to improve the e-mail classification accuracy.Ensemble can improve the classification accuracy of classifier. Neural network is more effective one of the methods which are used in machine learning programs currently. However, neural network is easy to fall into local minimum, assigning an e-mail to the wrong category. Neural network ensemble combines a number of different neural networks into a single classifier, and its output is decided to the integration of various neural networks. Based on the idea to improve the generalization ability of learning systems, improve the filtration performance of filtration systems. This paper will be studied in this respect.The spam filtering system model designed in this paper includes three parts of preprocess, feature selection, classifier design. Preprocess treats the standard e-mail corpus of data as the form of vector space model (VSM) that the computer can identify and handle easily. Feature selection uses information gain (IG) algorithm reducing the data dimension, and improving the operational efficiency of the algorithm. Classifier design constructs an e-mail classifier and filters a spam, using Boosting and Bagging of neural network ensemble methods. The category of an e-mail is defined by combining the output of multiple single-classifier approach. It experiments on the PU series corpus of spam and compares with a single classifier with the RBF neural network. It uses an evaluation method based on Confusion Matrix in addition to traditional evaluation indicators, proving that neural network ensemble is more effective in filtering a spam.
Keywords/Search Tags:Spam Filtering, Preprocess, Feature Selection, Neural Network Ensemble
PDF Full Text Request
Related items