Research On Support Vector Machine Solving The Large-scale Data Set

Posted on:2007-11-28

Degree:Master

Type:Thesis

Country:China

Candidate:W M Chen

Full Text:PDF

GTID:2178360215997509

Subject:Precision instruments and machinery

Abstract/Summary:

PDF Full Text Request

Support Vector Machine (SVM) is a kind of novel machine learning methods which have become the hotspot of machine learning because of their excellent learning performance. They also have successful applications in many fields, such as bioinformatics, face detection, handwriting digit recognition, etc. But as a new technique, SVM also have many shortcomings that need to be researched, especially for its costing too much time when training large-scale data set.In this paper, we mainly discuss the training algorithm of SVM on large-scale data set. At first, we introduced the basic concept of SVM theory;Then we discuss the reasons for large-sized memory and slow iterations in SVMs'training, and also pay attention to the some successful fast training algorithms of SVM; As following, two new improved support vector machine algorithms are proposed in the paper, which are focus on samples'pre-treating and working-set selection.The main research in this paper can be classed as follows:1. We formulate on training SVM described as a large-scale convex quadratic programming problem, and discuss and summarize the performance for generalization of SVM. Then some of the popular fast training algorithms are compared, such as Chunking, Decomposition and SMO.3. The sample reducing methods for SVM are discussed deeply. We analyze and classify the different strategies for sample reducing into three styles, and then propose a new sample reducing method in SVM based on K-closest Sub-clusters. Clustering techniques such as K-mean are employed to find initial sub-clusters, and then the sub-clusters near the boundary can be selected as the training set that would contains most of border vectors. Therefore the SVM training set is reduced by deleting a majority of insignificant samples, and the speed of SVM training can be increased without effects on the classification results.4. The Reserve Working Set Strategies are discussed, which are important steps in training SVM. We pay much attention to analyze SVMlight, Platt's SMO and LIBSVM that introduce the steepest feasible direction of descent, kernel cache and shrinking strategies. This paper combine these advantages, and propose an improved SVM training algorithm——SMO algorithm based on Reserve Working Set. This new strategy selects several maximal violating samples from Cache as the Reserve Working Set which will provide iterative working sets for the next several optimizing steps. And the strategy can improve the efficiency of the kernel cache and reduce the computational cost related to the working set selection.

Keywords/Search Tags:

support vector machine, training algorithm, sample reducing, working set selection

PDF Full Text Request

Related items

1	Support Vector Machine Based On Boundary Sample Selection
2	Researches On Some Problems In Nonparallel Hyperplanes Support Vector Machine And Feature Extraction
3	Recognition Based On Support Vector Machines For Speaker
4	Study On Application Of Machine Learning Based On Support Vector Machine
5	The Research Of Support Vector Machine Based On Sample Selection
6	Research And Applications Of Support Vector Machines
7	The Key Techologies Of Fuzzy Support Vector Machine
8	Acceleration And Application Of Support Vector Machines
9	Research And Application Of Network Intrusion Detection Technology Based On Active Learning Support Vector Machine
10	Research And Application Of Imbalance Data Classification Based On Support Vector Machine