Font Size: a A A

The Study Of SPAM Filtering Method Based On Risk Minimization

Posted on:2013-05-19Degree:MasterType:Thesis
Country:ChinaCandidate:X W YuFull Text:PDF
GTID:2248330395475667Subject:Software engineering
Abstract/Summary:PDF Full Text Request
With the rapid development and popularization of the internet technology, E-mail has been deeply lived in users’life because of its convenience and low cost. However, E-mails not only provide convenience to people, but also bring intrusiveness--the spread of spam. Spam seriously pollutes the network world. Finding an effective spam filtering technology, has become key of enjoying the Internet world to people.In content based mail filtering technology, Bayes and support vector machine mail filtering technology from the new machine learning algorithm in intelligent filtering effect, because of their outstanding performance, has been widely used. In this paper, the main research work includes:1、Study the text representation in spam filtering. We construct hash function to collect the fingerprints of texts by using Karp-Rabin Algorithm.2、Study the method of feature selection. We use feature selection algorithms based on class condition and the improved mutual information respectively in Bayes and SVM; Considing the difference between the mail and plain text, we construct the message headers and body integrated weighted model.3、Study the decision-making risk minimization Bayes classification algorithm and structural risk minimization classification algorithm of support vector machine in depth. We analysis that the loss of legitimate messages which classified as spam comparing to the loss spam of which classified as legitimate mail will be much larger, in the two classification algorithms, we introduce the cost factor and penalty factor, to realize the risk minimization of e-mail filtering model. We realize and compare the risk minimization of mail filtering system based on Bayes algorithm and SVM.4、We test the spam filtering algorithm based on minimum risk of Bayesian and SVM mail filtering algorithm in mail filtering platform by using SEWM2012data sets, experimental results show that compared with the classical Bogofilter, its reliability and validity have promotions.
Keywords/Search Tags:Support Vector Machine, Bayes, Risk Minimization, Feature selection, Threshold adjustment
PDF Full Text Request
Related items