Font Size: a A A

Research Of The Spam Filtering Based On SVM And D-S Theory

Posted on:2009-02-05Degree:MasterType:Thesis
Country:ChinaCandidate:X YangFull Text:PDF
GTID:2178360275950863Subject:Computer application technology
Abstract/Summary:PDF Full Text Request
With the prevalence of internet,electronic mail,by the advantage of its rapidness and convenience,has gradually developed as one of the most significant corresponding means for people's work and everyday life.However,the coming up spam problem becomes serious increasingly,either.It will not only spread illegitimate information,but also consume large capacity of public interact resources and aggrieve email users' and enterprises' legitimate rights.So far,there exist many kinds of spam filtering methods.The situation now is that the spam problems are not well solved to be depressed but increased instead.It demonstrates that although there are many spam filtering methods,the filtering effect is not up to the ideal state.Thus,it is still quite meaningful to research and acquire a more highly-efficient spam filtering system.Support Vector Machine is a newly-developed pattern recognition method based on the statistics theory.It represents particular advantages when solving limited examples,non-linear and high-dimenison pattern recognition issues.It considers the requirement for extension ability while pursues the most optimal result under the condition of limited examples.In this article,SVM has been applied to E-mail-filter.However,this technique is usually applied to spam identity based on the mail body textual content only,seldom discussed in mail header.Those short mail bodies including empty mail body textual content are difficult to judge be spam by only analyzing mail body text.If such identification is conducted with the consideration of the features of mail headers,the results can be more objective and accurate.In addition,as the spam can disguise itself well or the keywords for legitimate match the keywords for spam,each mail samples have difference in classify.one sample couldn't be classified as a category clearly,but with probability or degree of membership,it will improve accuracy.Therefore,it was not appropriate that mail classification predicts only class label,such as y∈{-1,+1}.To solve the above two problem,this article proposed using SVM with probability to classify e-mail according to the features of mail headers and mail body respectively,and adding unsure mail to the identity framework of mail.That is to say,when preprocessing the mail,the mail headers' key words and mail bodies were extracted respectively,and lastly the feature lexicon of mail headers and mail bodies were constituted,then using SVM to train mail heads and bodies respectively,and dassifying mail by the SVM with probability,so the basic probability assignment of three categories,which are spare,ham and unsureness,which mail heads and mail bodies belong to could be got respectively.Dempster Shafer(D-S) Evidence Theory is a mathmatic mothod based on evidence and synthesis to deal with uncertainty reasoning problem,D-S theory can be used effectively to raise the target recognition,through synthesis of D-S evidence,the uncertainty of target recognition declines.Thus using the synthesis rules of D-S Theory to combine the basic probability assignment of mail heads and mail bodies was proposed,then the probability assignment of the mails in three class were calculated,and then judging the class of mails according the decision rule of the method based on basic probability assignment.It reduced uncertainty of mail identification and then improve the accuracy of spam identification.In a word,this paper hereby proposes the Spam Discrimination Model based on SVM and D-S Theory,and it uses SVM with probability to sort out mail according to the features of mail headers and mail body textual content,and it uses D-S Theory to identify spam finally,which' will help improve the accuracy of the spam falter.
Keywords/Search Tags:spam, mail filtering, feature selection, SVM(support vector machine), D-S(Dempster Shafer) Theory
PDF Full Text Request
Related items