Font Size: a A A

Research On Support Vector Machine Solving The Large-scale Data Set

Posted on:2007-11-28Degree:MasterType:Thesis
Country:ChinaCandidate:W M ChenFull Text:PDF
GTID:2178360215997509Subject:Precision instruments and machinery
Abstract/Summary:PDF Full Text Request
Support Vector Machine (SVM) is a kind of novel machine learning methods which have become the hotspot of machine learning because of their excellent learning performance. They also have successful applications in many fields, such as bioinformatics, face detection, handwriting digit recognition, etc. But as a new technique, SVM also have many shortcomings that need to be researched, especially for its costing too much time when training large-scale data set.In this paper, we mainly discuss the training algorithm of SVM on large-scale data set. At first, we introduced the basic concept of SVM theory;Then we discuss the reasons for large-sized memory and slow iterations in SVMs'training, and also pay attention to the some successful fast training algorithms of SVM; As following, two new improved support vector machine algorithms are proposed in the paper, which are focus on samples'pre-treating and working-set selection.The main research in this paper can be classed as follows:1. We formulate on training SVM described as a large-scale convex quadratic programming problem, and discuss and summarize the performance for generalization of SVM. Then some of the popular fast training algorithms are compared, such as Chunking, Decomposition and SMO.3. The sample reducing methods for SVM are discussed deeply. We analyze and classify the different strategies for sample reducing into three styles, and then propose a new sample reducing method in SVM based on K-closest Sub-clusters. Clustering techniques such as K-mean are employed to find initial sub-clusters, and then the sub-clusters near the boundary can be selected as the training set that would contains most of border vectors. Therefore the SVM training set is reduced by deleting a majority of insignificant samples, and the speed of SVM training can be increased without effects on the classification results.4. The Reserve Working Set Strategies are discussed, which are important steps in training SVM. We pay much attention to analyze SVMlight, Platt's SMO and LIBSVM that introduce the steepest feasible direction of descent, kernel cache and shrinking strategies. This paper combine these advantages, and propose an improved SVM training algorithm——SMO algorithm based on Reserve Working Set. This new strategy selects several maximal violating samples from Cache as the Reserve Working Set which will provide iterative working sets for the next several optimizing steps. And the strategy can improve the efficiency of the kernel cache and reduce the computational cost related to the working set selection.
Keywords/Search Tags:support vector machine, training algorithm, sample reducing, working set selection
PDF Full Text Request
Related items