Font Size: a A A

Research On Model Selection For Support Vector Machine

Posted on:2011-05-04Degree:DoctorType:Dissertation
Country:ChinaCandidate:T H WangFull Text:PDF
GTID:1118360302970477Subject:Computer software and theory
Abstract/Summary:PDF Full Text Request
The main goal of statistical learning theory (STL) is to provide a comparatively integrated theoretical basis for studying the machine-learning problems with finite learning examples. Support vector machine (SVM) is a new learning algorithm, which was introduced in the framework of the STL. Compared with the traditional learning algorithms, SVM can overcome the problems such as small samples, nonlinear, overfitting, curse of dimensionality, local minima, etc., and generalize well for unseen data. Nowadays, SVM has been successfully applied for a wide range of different data analysis problems, such as pattern recognition, regression estimation, probability density estimation, etc. Furthermore, SVM brings about the growing popularity of the kernel-based learning methods, which can analyze efficiently the nonlinear relationship. Currently, SVM and other kernel methods have become one of the research focuses in machine learning community.It is well known that the performance of SVM depends mostly on the selection of kernel function and penalty coefficient (regularization parameter) C. Given a specific problem, how to select the kernel function and regularization parameter is well known as the model selection issue. Model selection, especially kernel selection, is one of the central interests in SVM. In this work, we concentrate us on the model selection, especially kernel selection, for SVM and attempt to make a considerably deep exploration on some aspects of this issue. The main contents and contributions of this dissertation are as follows:1. We summarize systematically the statistical learning theory, kernel feature space and SVM, which are the bases of this work. We introduce these contents in a considerably concise way and simultaneously strive to avoid the decrease in integrality and systematization. Additionally, during the introduction, we add appropriately some of our own understanding on these contents.2. We explore the semantic interpretation of the SVM parameters, and point out that the influence of different features and samples on the classification results can be measured by the kernel parameter and regularization parameter, hence the investigation of the importance of the features and samples for SVM can be reduced to a model selection issue. Based on the analysis of the sample weighted SVM model (such as Fuzzy SVM), a new model, i.e., feature weighted SVM (FWSVM for short) is proposed. FWSVM in essence is the combination of the feature weighting and SVM. However, we introduce the feature weighting into the construction of kernel function, hence we can analyze the influence of the feature weighting on the SVM classification performance from the perspective of kernel function. Theoretical analysis and experimental results show that the FWSVM has better generalization ability than the standard SVM.3. First we summarize systematically the commonly used model selection (especially kernel parameter selection) methods, such as cross-validation technique, minimizing the LOO error or its upper bounds, optimizing kernel evaluation measure, etc. After that, we investigate further the geometric significance of the kernel polarization, and point out that high kernel polarization value means to keep within-class data pairs close and between-class data pairs apart. Subsequently, we propose an algorithm for learning the general Gaussian kernels by optimizing kernel polarization, i.e., kernel polarization-based gradient ascent algorithm (KPG for short). Compared with the optimized standard Gaussian kernel, general Gaussian kernel which is adapted by KPG can yield better generalization performance of SVM. Additionally, we also propose a variant of KPG for SVM feature selection, i.e., KPFS, which is demonstrated preliminarily with some UCI machine learning benchmark examples.4. Enlightened by the Local Fisher Discriminant Analysis (LFDA), we explore the design of kernel evaluation measure in the case of multimodality (samples of the same class form several separate clusters, i.e., local structure of the data of the same class). We point out that currently commonly used kernel evaluation measures all neglect the influence of the local structure on the classification performance and the 'globality' of the these measures may leave less degree of freedom for increasing separability. To overcome this disadvantage, we then propose a 'localized' kernel evaluation measure, i.e., local kernel polarization. Local kernel polarization can preserve to some extent the local structure of the data of the same class by introducing the affinity coefficients between the data pairs, hence can increase further the separability of the between-class data points. Local kernel polarization is demonstrated with some UCI machine learning benchmark examples.
Keywords/Search Tags:Machine learning, Pattern recognition, Support vector machine (SVM), Model selection, Kernel function, Kernel evaluation
PDF Full Text Request
Related items