Spam Filtering Based On Kernel Paitial Least Squares Feature Extraction

Posted on:2013-10-26

Degree:Master

Type:Thesis

Country:China

Candidate:J Chen

Full Text:PDF

GTID:2248330362469984

Subject:Computer application technology

Abstract/Summary:

PDF Full Text Request

Email is one of services be widely used on the internet, with the development of internet,a lot of spam appear, which bring a lot of trouble to the society. Therefore, how to effectivelyblock the spam become information security and information processing research of widepublic concern, and it has important theoretical significance and application value to society.At present, the spam filtering technology of content-based is one of the key research inthis area, it is a supervised learning, which is a branch of classification. Many of machinelearning methods has been applied into the field of spam filtering and achieved good results,but the data of base on the vector space model is high dimensional, sparse and entries related(synonyms) and so on, which result the ability of classification difficult, so it is necessaryreduce the dimensionality of spam data. Feature extraction is an important data dimensionreduction methods, such as principal component analysis and partial least squares and so on.PCA and PLS is proposed for the linear problem, but a lot of nonlinear problems exit, so themethod of nuclear be proposed, which is KPCA and KPLS. And they be widely used into textmining, genetic data analysis and achieved great success.PLS according striking the maximization covariance between original features andcharacteristics, dig out the inherent and hidden features from original features, and then get anew low-dimensional feature space. Kernel partial least squares introduce the kernel functionbased on partial least squares, which works well for spam dimension reduce and offsetvariable related adverse effectsBased on the research of the spam filtering technologies, the key point is focused on thefeature extraction implement on the spam filtering via using PLS and KPLS. A comparativeexperiment using the different classification algorithms (support vector machine SVM andK-nearest neighbor classification algorithm) is conducted to show the performance of PCAand KPCA on feature extraction. The email corpus used in the experiment comes fromTREC06C and Enron-Spam. By anglicizing the comparative experiment, the conclusion thatthe efficiency of spam filtering improved is draw.

Keywords/Search Tags:

spam, high-dimension, kernel partial least squares, non-linear

PDF Full Text Request

Related items

1	Spam Filtering Based On Partial Least Squares
2	PLS Algorithm And Its Applications To SRM-Based Machine Learning
3	Research And Application Of Partial Least Squares Based Dimension Reduction
4	Industrial Process Monitoring Based On Kernel Partial Least Squares
5	Research On Partial Least Squares Dimension Reduction Based Facial Age Estimation
6	Kernel partial least squares (K-PLS) for scientific data mining
7	Performance Optimization Of Non-linear Classifier
8	The Research On The Methods Of Soft-sensing And Its Industrial Application
9	Image Feature Extraction Methods
10	Response Surface Modeling By Local Kernel Partial Least Squares