Font Size: a A A

Research On The Spam Filter Algorithm Based On Support Vector Machines

Posted on:2009-04-13Degree:MasterType:Thesis
Country:ChinaCandidate:P ZhangFull Text:PDF
GTID:2178360245954985Subject:Computer application technology
Abstract/Summary:PDF Full Text Request
Spams not only occupy the storage space of mail server, but also affect the normal information communication. How to control spams effectively has become an important issue, and a lot of scholars have dedicated themselves to the research of spam filter technology. Support vector machines, which are used to construct the classifier and discover knowledge from KDD, adopted the structure risk minimization to improve the generalizing performance of learning machines. Therefore, this thesis focuses on the research of the spam filter algorithm based on support vector machines.The research of kernel functions is one of key issues in support vector machines. In this thesis, the spam filter algorithm based on support vector machines is firstly implemented, and kernel functions in support vector machines are detailed. And then, new kernel functions based on the characteristic of mail corpora are proposed. Finally, experimental results indicate that new kernel functions in this thesis are reasonable, which not only simplify the choosing of parameters, but also improve the algorithm performance.The main contributions of this thesis are as follows:The spam filter algorithm based on support vector machines is analyzed and implemented. By experiments with different mail sample corpora, the effect of mail corpora's characteristic on the filter algorithm is analyzed. Then, experiments with the radial basic function and the polynomial kernel function are used to reveal the relationship between kernel functions and the filter algorithm.Based on the characteristic of mail corpora, the new radial basic function-MRbf and the new polynomial kernel function-MPloy are presented. According to the conclusions of the second and third chapters, the radial basic function and the polynomial kernel function are improved, subsequently, new kernel functions-MRbf and MPloy-are put forward. The comparative experimental results show that MRbf and MPloy are easy to apply and improve the algorithm performance. Based on the radial basic function and the polynomial kernel function, new combinational kernel function is proposed. The learn ability and generalization ability of the two kernel functions-radial basic function and the polynomial kernel function-are compared. By analyzing advantages and disadvantages of the two kernel functions, the combinational kernel function is proposed using the convex combination of them. However, there are four parameters in the proposed kernel, and they are not easy to control. Based on the characteristic of mail corpora, a new kernel function-MRP is used to improve the combinational kernel function. Experimental results show that the proposed MPR kernel function improves the filter algorithm performance greatly.
Keywords/Search Tags:support vector machines, spam filter, MRbf, MPloy, combinational kernel function
PDF Full Text Request
Related items