Research Of E-mail Filtering Based On SVM

Posted on:2008-04-01

Degree:Master

Type:Thesis

Country:China

Candidate:J L Yang

Full Text:PDF

GTID:2178360212995556

Subject:Computer application technology

Abstract/Summary:

PDF Full Text Request

With the rapid development of Internet and its application, the spam has become a headache problem for its users. It does harm to the legal rights of email customers, threatens the Internet information safety, and causes great losses to national economy annually. Therefore, research of valid filtering emails method is a subject with great realistic value.Support vector machine (SVM) is a kind of new machine learning method based on the statistical learning theory. According to structure risk minimization principle, it is important to improve the generalization ability of learning machine. If there has small error for limited training samples, then the error would keep small for independent testing samples. SVM algorithm is a convex optimization problem, so the local optimal solution is sure to be the global optimal solution, which has been shown to provide higher performance than traditional learning machines and has been introduced as powerful tools for solving classification problems.We find that the current machine learning methods classify emails into the legitimate or the spam for a certainty. However, in practice different users of server-side hold different opinions of whether an email is the legitimate or not, and to what extent. As a result, research of email filtering should be considered as dealing with the uncertainties. In this paper, to formalize the uncertainty, the legitimate email is understood as fuzzy concept on a set of email samples, its membership function is obtained by aggregating opinions of Internet users, and aggregation operator is OWA operator. Due to email training samples with membership degrees of the legitimate email, fuzzy support vector machine (FSVM) is adopted to classify emails, and penalty factor of FSVM is decided by content-specific misclassification costs. The advantages of our method are: 1) uncertainty of the legitimate email, i.e. , membership degree, is considered in classifying emails, and a method to obtain membership degree is given; 2) content-specific misclassification costs is used to decide penalty factor of FSVM.In addition, legitimate and spam samples are endowed with the fuzzy attitude of legitimate in the training model in above filtering method, which probably brings logical ambiguity. Therefore we present an improvable filtering email method which based on one-class support vector machines (1-SVM). Firstly, fuzzy factor in FSVM, i.e., fuzzy attitude of email samples is introduced to 1-SVM. In this way, uncertainties of email are process through one classification principle. Meanwhile the penalty factor model of legitimate in special content misclassification costs is integrated into 1-SVM for insuring effectiveness of email filtering. We just require legitimate samples to set up filtering model and the spam is detected in the method. The legitimate email is understood as fuzzy concept on a set of email samples. And all of email samples are endowed with oneclass fuzzy attitudeâ€”legitimate attitude. The method no longer has logicalambiguity.Finally, Simulative experiments are conducted for the effectiveness and human consistent of our two methods respectively.

Keywords/Search Tags:

Email filtering, SVM, FSVM, OWA, 1-SVM

PDF Full Text Request

Related items

1	The Research Of Performance Tuning On Teh Prefix Email Filtering System
2	Design And Implementation Of The Sme Web Mail System
3	A comparison of email filtering techniques
4	The Research And Design Of E-mail Pre-processing And Filtering Management System
5	Secure Email Server System
6	Research On Email Filtering Mechanism Based On Cloud-Computing Techniques
7	Research On MTS Based On Improved Measurement Tool And Threshold Calculation Method And Its Application In Email Filtering
8	The Design And Implementation Of Spam Email Filtering Technology Based On Content Analysis
9	Research On The Method Of Chinese Email Filtering Based On SVM
10	Design And Implementation Of Spear-phishing Email Detection System Based On Email Content Mining