Font Size: a A A

Research On Support Vector Machines And Kernel Methods

Posted on:2003-07-01Degree:DoctorType:Dissertation
Country:ChinaCandidate:L ZhangFull Text:PDF
GTID:1118360095951190Subject:Circuits and Systems
Abstract/Summary:PDF Full Text Request
The statistical learning theory (SLT) is a tool for studying the model of learning from examples. At the early days of the research on SLT, it did not come into front for its purely theoretical analysis. But in the middle of the 1990's, some new types of the learning algorithms (or support vector machines, SVMs) based on the developed theory were proposed by Vapnik et al. SVM is a kind of general learning algorithms, which has been widely used in pattern recognition, regression estimation, function approximation, density estimation, etc. This dissertation studies the improvement of support vector algorithm, the construction of admissible support vector kernels, the application of kernel method and the construction of new learning machine based on SLT.For the improvement of SVMs, center distance ratio methods are proposed for pre-extracting support vectors. The learning procedure of SVM includes all examples, in which,, however, only a few examples (namely the support vectors) play an important role on the decision function of SVM. The method provided in this dissertation is able to extract a verge vector set containing all support vectors from a given training set if the threshold is chosen reasonable. This method greatly reduces the number of training examples and so improves the training speed of SVM, without the loss in generalization performance. For the multi-class problem, a method based on decision tree is presented that is able to reduce a multi-class classification problem to multiple binary classification problems each of which is solved by SVM. Besides, a binary decision tree structure is adopted, which makes the number of the training examples and the classes on the children nodes are smaller than that on the parent nodes. Compared with other multi-class methods for reducing a multi-class problem to multiple binary problems, our method has the less workload system training, as well as the smaller number of the training examples for every time and the smaller number of the support vectors. Therefore the method has faster testing speeding than the previous.On the other hand, some admissible support vector kernels are proposed, including three coordinates-transform kernels, a wavelet kernel and a scaling kernel. The coordinates-transform kernels are a kind of kernels that incorporates the prior knowledge of some learning problems for which these kernels are efficient. Both the wavelet function and the scaling function have good approximation performance. Besides, both the wavelet kerneland the scaling kernel can be regarded as high-dimensional wavelet and scaling function respectively. Therefore, the wavelet kernel and the scaling kernel also have good performance in approximation ability. All these kernel functions provide more choices for SVMs and the other algorithms in which the kernel method is used.By introducing the kernel method in neural networks and the clustering analysis, a wavelet kernel function network (WKFN) and a kernel clustering algorithm are constructed, respectively. By the kernel mapping, the examples in the input space can be mapped into a high-dimensional feature space in which linear approximation, recognition and clustering are performed. The merit of WKFN is that the curse of dimension can be avoided. The number of hidden nodes is decided not by the dimension of the examples but the number of training examples. WKFN uses the kernel perceptron algorithm so as to avoid solving the constrained quadratic programming. Moreover the solution obtained by WKFN is global optimal, and the kernel clustering algorithm can obtain better performance than classical clustering algorithms by using kernel mapping.In the end, a novel learning machine is presented based on differential capacity control in which the bounds on the generalization ability are the distribution-dependent bounds. The capacity control is implemented by controlling on the differential of the set of functions. The learning machines can use any one-order differentiable functions including Mercer kernel functio...
Keywords/Search Tags:Statistical learning theory, Pattern recognition, Regression estimation, Support vector machine, Kernel method, Mercer kernel function, Decision tree, Neural networks, Clustering analysis, learning machine
PDF Full Text Request
Related items