Font Size: a A A

The Research Of Spam Filtering Technology Based On Fuzzy Support Vector Machines

Posted on:2011-12-27Degree:MasterType:Thesis
Country:ChinaCandidate:H T ZhaoFull Text:PDF
GTID:2178360332955995Subject:Computer software and theory
Abstract/Summary:PDF Full Text Request
With the rapid development of the Internet, as a modern means of communication E-mail is used widely. But with the widespread use of e-mail, people enjoy the convenience brought by e-mail, but also by a lot of spam. Support Vector Machine (SVM) is a machine learning method based on statistical learning theory, which based on the smallest structural risk and has some advantages such as small samples, generalization ability and the advantages of global optimization. SVM method has been successfully used in many fields, and has also been a hot research topic In the areas of spam filtering.In this paper through studying the principle of the e-mail, mail format and the technology of e-mail pretreatment we get the expression with vectors of e-mail for e_mail filtering. And focus on learning of the technology of support vector machine and fuzzy support vector machine, introduces the technology of fuzzy support vector machine into spam filtering in the Lai, and design a new function of fuzzy membership, and consideres the serious consequences of legitimate e_mail's misclassification and use different penalty parameters C. Finally, proposing a methods of FSVM spam filtering based on the loss of misclassification, and conducte a simulation experiment.The main research contents and innovations of this paper are as follows:1) Research the working principle of the e-mail, related protocols of e_mail and e-mail pretreatment. Focuses on the feature extraction and vector express of e_mail: using the method which combined the forward maximum matching with the reverse maximum matching method to separate chinese vocabulary on the e_mail text, and using the method of Document-Frequency to select features, and finally using the function of TF_IDF to build a vector space model.2) Study and contrasted the technology of e_mail filtering based on support vector machine and other mail filtering technology. The technology of spam filtering based on Support vector machine has some advantages such as small samples, generalization ability and the advantages of global optimization, but there are two obvious problems: Mail filtering actually is an uncertain information processing problem, the method based on SVM treats it as a certain one; On the other hand the rate of misclassifying legitimate mails and spam by the approach based on SVM is equivalent ,which ignores the matter that misclassifying a legitimate mail is more serious than misclassifying spam.3) Introduced The technology of fuzzy support vector machine into spam filtering, and focuses on fuzzy membership function of support vector machines and penalty parameters, designed a new fuzzy membership function based on class center, proposing a methods of FSVM spam filtering based on the loss of misclassification.4) Research and design more appropriate methods of spam filtering evaluation,eg. LP,LR,WR, mainly using the recall of legitimate mail recall and other comprehensive indicators as evaluation tools.Conducted a simulation experiment to compare the performance of the method we proposed based on FSVM and SVM in spam filtering.The results of Simulation prove that the method based on FSVM considered the loss of misclassification of spam filtering method ensure a high rate of spam filtering, ensure a high recall rate of legitimate e-mail in the same time, which resolve the problem that the result of misclassifing a legitimate e-mail is more serious than misclassifing spam.
Keywords/Search Tags:Spam, SVM, Fuzzy Membership, FSVM, Loss of Misclassification
PDF Full Text Request
Related items