Support Vector Machine Learning Algorithms Based On Within-Class Structure

Posted on:2016-10-16

Degree:Doctor

Type:Dissertation

Country:China

Candidate:W J An

Full Text:PDF

GTID:1228330470455951

Subject:Signal and Information Processing

Abstract/Summary:

PDF Full Text Request

Support Vector Machine (SVM), which bases on the VC theory and the principle of structural risk minimization in statistical learning theory, is a novel machine learning approach. According to the small samples information, it finds the best trade-off between the model complexity and learning ability in order to obtain the best generalization ability. SVM possesses solid mathematic theory foundation, and can solve well many practical problems, such as small samples, nonlinearity, over learning, high-dimension and local minima. Because of its excellent performance, it has been widely applied in many areas, and has become a hot topic in the field of machine learning.This dissertation focuses on SVM classifier. We point out that it ignores an important prior knowledge, namely, the within-class structure of the samples. For this, the dissertation studies the support vector machine learning algorithms based on within-class structure, and we propose the improved algorithm based on within-class scatter. Aiming at the existing problems of noise, outlier detection, imbalanced data learning, we also correspondingly propose new algorithms. The main works in this dissertation include contents as follows:(1) We first analyze the similarities and differences between SVM and Fisher Discriminant Analysis (FDA), and point out that SVM ignores the within-class structure of samples, then propose a new classification algorithm-SVM based on within-class scatter (WCS-SVM), which incorporates minimum within-class scatter in FDA into SVM. The main idea is that an optimal hyperplane is found such that the margin is maximized while the within-class scatter is kept as small as possible. Numerical experimental results show that the proposed WCS-SVM algorithm behaves well. Finally, we combine the Unsupervised Clustering (UC) with WCS-SVM, and apply it to intrusion detection. The results further validate the proposed algorithm is efficient.(2) In order to better characterize the contribution of training samples to the classification hyperplane, we propose a new fuzzy membership function based on affinity among samples in Fuzzy Support Vector Machine(FSVM). This fuzzy membership function considers both the distance between each sample and its class center and the affinity among sample points. We focus on introducing two different parameters for the positive class and negative class respectively to measure within-class affinity, and these two parameters need to be set beforehand. Hereon, we use Support Vector Data Description (SVDD) to determine them. Experimental results show that this new fuzzy membership function can more efficiently reduce the effect of outliers or noises. In order to better handle classification problem with outliers or noises, we not only assign fuzzy membership for each sample, but also consider the within-class structure in the training dataset. We propose a new classification algorithm-FSVM based on within-class scatter (WCS-FSVM), which incorporates minimum within-class scatter in FDA into FSVM. The derivation of this algorithm is given in detail, and the convergence of the algorithm is proved strictly in the dissertation. The experimental results show that our proposed WCS-FSVM algorithm can not only improve the classification accuracy and generalization ability but also handle the classification problems with outliers or noises more effectively.(3) Outlier detection is one of important research topics in data mining and machine learning. Its task is to identify sample points markedly deviating from the normal data. A reliable outlier detector needs to build a model which encloses the normal data tightly. We make the most of sample information, and propose improved One-Class Support Vector Machine (OC-SVM), which is used in outlier detection. In our experiment, we use overall accuracy (OA) and Kappa coefficient κ as a criterion to evaluate the performance of algorithms, and compare the improved algorithm with GDD, NNDD, PC A, OC-SVM. The results demonstrate that our proposed algorithm is effective and superior to the compared algorithms, and overall accuracy to a certain extent was elevated.(4) Imbalanced data widely exists in practice, and imbalanced data learning is also one of the research hotspots in data mining and machine learning. The imbalance between positive and negative training samples causes the migration of classification hyperplane in SVM. In order to suppress this migration effectively, this dissertation proposed Different Error Costs SVM based on within-class scatter (DEC-WCSSVM). This new algorithm not only assigns different error costs to the minority class and majority class, but also considers the within-class structure of the training samples, to better reduce the impact of imbalanced data on the classification performance. In our experiment, we adopt G-means as a criterion to evaluate the performance of algorithm. The results demonstrate that our proposed DEC-WCSSVM algorithm can improve both the accuracy of the minority class and G-means, and it can suppress the migration of classification hyperplane effectively.

Keywords/Search Tags:

Statistical Learning Theory, Support Vector Machine, Fuzzymembership function, Within-class structure, Within-class scatter, Outlier detection, Imbalanced Data Learning

PDF Full Text Request

Related items

1	Research Of Learning Methods On Single-class Support Vector Machine
2	Study On Some Support Vector Machine Algorithms And Their Applications
3	Anomaly Detection Research For Imbalanced Classes
4	Research And Analysis Of Multi-class Support Vector Machine
5	Research On Support Vector Machine Models And Algorithms For Imbalanced Data
6	The Research Of Classification Algorithm Based On Support Vector Machine
7	Research Of Support Vector Machine Learning Algorithms
8	Research And Application Of Correlation Filter Based On Data Structure Information
9	Research On Intrusion Detection Based On Machine Learning
10	Imbalanced Data Learning Based On Kernel Methods