Font Size: a A A

Research Of E-mail Filtering Based On SVM

Posted on:2008-04-01Degree:MasterType:Thesis
Country:ChinaCandidate:J L YangFull Text:PDF
GTID:2178360212995556Subject:Computer application technology
Abstract/Summary:PDF Full Text Request
With the rapid development of Internet and its application, the spam has become a headache problem for its users. It does harm to the legal rights of email customers, threatens the Internet information safety, and causes great losses to national economy annually. Therefore, research of valid filtering emails method is a subject with great realistic value.Support vector machine (SVM) is a kind of new machine learning method based on the statistical learning theory. According to structure risk minimization principle, it is important to improve the generalization ability of learning machine. If there has small error for limited training samples, then the error would keep small for independent testing samples. SVM algorithm is a convex optimization problem, so the local optimal solution is sure to be the global optimal solution, which has been shown to provide higher performance than traditional learning machines and has been introduced as powerful tools for solving classification problems.We find that the current machine learning methods classify emails into the legitimate or the spam for a certainty. However, in practice different users of server-side hold different opinions of whether an email is the legitimate or not, and to what extent. As a result, research of email filtering should be considered as dealing with the uncertainties. In this paper, to formalize the uncertainty, the legitimate email is understood as fuzzy concept on a set of email samples, its membership function is obtained by aggregating opinions of Internet users, and aggregation operator is OWA operator. Due to email training samples with membership degrees of the legitimate email, fuzzy support vector machine (FSVM) is adopted to classify emails, and penalty factor of FSVM is decided by content-specific misclassification costs. The advantages of our method are: 1) uncertainty of the legitimate email, i.e. , membership degree, is considered in classifying emails, and a method to obtain membership degree is given; 2) content-specific misclassification costs is used to decide penalty factor of FSVM.In addition, legitimate and spam samples are endowed with the fuzzy attitude of legitimate in the training model in above filtering method, which probably brings logical ambiguity. Therefore we present an improvable filtering email method which based on one-class support vector machines (1-SVM). Firstly, fuzzy factor in FSVM, i.e., fuzzy attitude of email samples is introduced to 1-SVM. In this way, uncertainties of email are process through one classification principle. Meanwhile the penalty factor model of legitimate in special content misclassification costs is integrated into 1-SVM for insuring effectiveness of email filtering. We just require legitimate samples to set up filtering model and the spam is detected in the method. The legitimate email is understood as fuzzy concept on a set of email samples. And all of email samples are endowed with oneclass fuzzy attitude—legitimate attitude. The method no longer has logicalambiguity.Finally, Simulative experiments are conducted for the effectiveness and human consistent of our two methods respectively.
Keywords/Search Tags:Email filtering, SVM, FSVM, OWA, 1-SVM
PDF Full Text Request
Related items