Font Size: a A A

Research On Spam Filtering Based On AdaBoost And Active Learning Methods

Posted on:2017-10-20Degree:MasterType:Thesis
Country:ChinaCandidate:X Y LiuFull Text:PDF
GTID:2348330482986644Subject:Software engineering
Abstract/Summary:PDF Full Text Request
The Internet becomes more and more important to our country and society with the promotion of the Internet plus initiative. The communication based on Internet has subverted the traditional communication such as telegraph, fax and telephone. The software of communication has become the main way to say hello,it has an important role and significance. E-mail is still the most commonly used tool at work, it is irreplaceable. Spam has a negative impact on our experience.Spam filtering is an important issue.Firstly, traditional machine learning methods prone to over-fitting phenomenon, generalization high error rate, and other shortcomings need to constantly adjust the parameters presented based on different base classifier AdaBoost ensemble classifier to avoid this phenomenon. Discussion based such as Naive Bayes and logistic regression methods of AdaBoost classifiers, and for different data sets, using more appropriate data set characteristics based on different classifier AdaBoost classifier. It was found that often AdaBoost ensemble learning method can effectively avoid over-fitting, and help to further enhance the performance of the classifier.Secondly, we use active learning method based on different sampling strategies to existing classifier for further performance optimization and upgrading. Active learning methods are often more concerned about those from the hyper plane more recent data point, which is that more is not easy to classify the correct data points, rather than to mark data that is very easy to correct classification, thereby greatly marked datasets thin, to reduce the number of samples can save sampling time, increasing the sampling efficiency purposes,ultimately effectively enhance classifier efficiency.Finally, we combined the AdaBoost based on different weak classifier and the active learning based on different sampling strategies to maximize theanalysis for different data sets. The method of using AdaBoost weak classifier based on a single or on multiple weak classifiers, and use active learning based method for the sampling strategy optimization and upgrading, and ultimately achieving effective data set classification process, reduce the generalization error rate, to avoid over-fitting phenomenon, improve the classification performance.
Keywords/Search Tags:adaboost, active learning, over-fitting, spam filtering
PDF Full Text Request
Related items