Font Size: a A A

Research On Several Problems In Support Vector Machine And Support Vector Domain Description

Posted on:2010-04-12Degree:DoctorType:Dissertation
Country:ChinaCandidate:J J LiangFull Text:PDF
GTID:1118360302469347Subject:Applied Mathematics
Abstract/Summary:PDF Full Text Request
Support Vector Machine (SVM) is a new supervised pattern recognition method based on Statistical Learning Theory (SLT). SVM adopts Structural Risk Minimization (SRM) principle; it makes compromise between the margin maximization and the error minimization, so as to control the generalization ability of the obtained classifier. SVM can better deal with problems, such as small sample, high dimension and nonlinearity. SVM has advantages such as high classification accuracies, few parameters, global optimums and strong generalization performances; it becomes a new research area in the field of machine learning research and it has been widely applied into various areas, such as pattern recognition, function regression and density estimation. Focusing on the existing problems, such as SVM training for large-scale samples, SVM ensemble learning, reformation of SVM, Support Vector Domain Description (SVDD), etc, detailed studies are made and all the researches can be summarized into the following five parts:1.The research on the algorithms of SVM training for large-scale samples. The training of SVM becomes difficult in practical applications for large-scale samples; it takes up large memory and enormous computation time. Based on the ideology of"divide and conquer"in parallel learning and the conclustion of"the equivalence between support vectors and the whole training samples", concentric hypersphere support vector machine is proposed and is called HSVM for short. Dividing the positive and negative samples using groups of concentric hyperspheres with the same layers, HSVM trains the samples between the margins using SVM and takes the combination of the obtained support vectors in corresponding margins to participate in the final SVM training. HSVM guarantees the classification accuracies of SVM while it reduces the training time of SVM.2.The research on ensemble learning of SVM. Ensemble learning ideology is utilized to construct a Space Support Vector Domain Classifier (SSVDC). Taking SVDC (Support Vector Domain Classifier) and KNN (K Nearest Neighbor) as the sub classifiers, SSVDC uses selective ensemble strategy to obtain the final classification results. Firstly, SSVDC uses SVDD to obtain the minimal enclosing hyperspheres of the positive and negative samples, and divides the training samples into several disconnected regions using boundaries of the two hyperspheres; then SSVDC calculates the distances of one test sample to the centers of the two minimal enclosing spheres of the positive and negative samples, and specifies the region that the test sample belongs to based on the relationship between the distances and the radiuses of the two minimal enclosing spheres; finally SSVDC takes the corresponding sub classifier to judge its label. Since sub classifiers are carried out on subsets of the data, SSVDC has short training times. Since different sub classifiers are erected based on the sample distribution, SSVDC has high target accuracies which vary little with kernel parameters. Numerical experiments demonstrate the effectiveness of SSVDC and the superiority over SVM and SVDC.3.The research on reformation of SVM. Reformation of SVM broadens the application areas of SVM, which can be obtained by changing the terms, variables or coefficients in the objective function in the original optimization problem. Based on quadratic loss function SVM, smoothing technique is adopted to construct Smooth Diagonal Weighted Support Vector Machine (SDWSVM). In the linear space, smooth model is directly obtained by using smoothing technique, which takes the integral function of sigmoid function to approximate the slack in the plus function form. In the kernel space, two methods are used to transform the original objective function before applying smoothing technique. One uses the method adopted by traditional SSVM which takes the Lagrange multiplier vector to substitute the weight vector of the separating hyperplane; the other uses dual technique between the primal and dual program to find the expression of the weight vector of the separating hyperplane. For the obtained smooth models in the linear and kernel space, Newton method is proposed to find the global optimums which guarantees high efficiency.4.Reduced Support Vector Domain Description (RSVDD) is presented. The training of SVDD involves solving a convex quadratic program, whose number of variables is equal to the number of training samples. To accelerate the training speed, RSVDD defines a self central distance ratio for each sample, which is the ratio of the central distance divided by the mean central distance, and takes it as a probability measure to judge whether one sample is a support vector or not. RSVDD selects some samples with high self central distance ratios to participate in the final SVDD training, thus reducing the training scale. RSVDD is easy to implement, has few parameters and maintains high target accuracies and short training time.5.Confidence Support Vector Domain Description (CSVDD) is proposed. Based on the geometry characteristic that support vectors usually distribute around the description boundary, a confidence sampling strategy is proposed so as to select some samples to participate in the SVDD training. Centered at each sample, a sphere with fixed radius is drawn. Counting the number of samples in the user-defined sphere, CSVDD takes this number as a confidence measure to judge whether the center sample is one support vector or not. Ranking the training samples in ascending order according to the confidence measure, CSVDD extracts a certain portion of those former ranked samples as boundary vectors to participate in the SVDD training. The training time of CSVDD are shorter than that of SVDD, while the target accuracies of the former are identical with those of the latter.
Keywords/Search Tags:Statistical learning theory, Support vector machine, Ensemble learning, Reformation of SVM, Smoothing technique, Support vector domain description, Self central distance ratio, Confidence measure
PDF Full Text Request
Related items