Font Size: a A A

The Research Of Support Vector Machine Based On Sample Selection

Posted on:2015-01-17Degree:MasterType:Thesis
Country:ChinaCandidate:Y LiuFull Text:PDF
GTID:2268330431951851Subject:Computer software and theory
Abstract/Summary:PDF Full Text Request
Based on the statistical learning theory, Support Vector Machine (SVM), which is a representative classification algorithm in Data Mining, is used to build the optimal hyper-plane through analyzing the known data with real label. Because of its global optimal value, and stronger generalization, SVM is irreplaceable in practical application.In recent years, scholars have been doing research on SVM; mainly improve it from the following aspects:(1) to improve the accuracy (2) to reduce the computational complexity. Decreasing the training set scale is one of the methods to reduce SVM computational complexity; the reason is that the location of SVM optimal separating hyper-planes only decided by some small part of sample, which are located near the border, building the classification model needs not to train all the samples.In the case of the theory that SVM separating hyper-planes’location is decided by a few key samples which called support vectors, in order to reduce scale of training set, this paper intends to select a few samples which are most likely to be support vectors, to form a new training set for decreasing the computational complexity. In terms of SVM algorithm, the samples near the decision border are the keys to find the optimal classification hyper-plane, according to the clustering hypothesis; those samples are generally distributed at the sparse part between two classes which are easily to be misclassified in clustering process. Based on the above, first each sample is assigned a clustering label through clustering algorithm before building the classification model, compared with its true classification label, and gather misclassified samples into the misclassified set(misC) by initial screening; for each one in misC, selecting its nearest even number of neighbors, analyzing the label relation and select some of its neighbors based on proposed rules:rule one, choosing neighbors which their labels are different to the misclassified sample; rule two, selecting the most information sample, the number of neighbors whose label is same with misclassified point, is equal to the different number, this misclassified point is supposed to be the most informative sample. Under the guidance of sample selection rule, the samples which are most likely to be support vectors has been selected, form the new training set and construct classification model.This paper combines K-Means and FCM clustering algorithm with SVM respectively, constructs K-SVM and F-SVM algorithm, and selects samples which reach the proposed rules and establish the model. The experiment shows that K-SVM reduces the training set scale without losing the generalization ability of the model and F-SVM establishes the classification model rapidly and fetch sample effectively, which uses small part of samples and control the accuracy in a certain range.
Keywords/Search Tags:Clustering Hypothesis, Support Vector Machine, ClusteringAlgorithm, Sample Selection
PDF Full Text Request
Related items