Font Size: a A A

Research On Multiple Kernel Learning

Posted on:2014-01-04Degree:DoctorType:Dissertation
Country:ChinaCandidate:X W LiuFull Text:PDF
GTID:1228330479479521Subject:Computer Science and Technology
Abstract/Summary:PDF Full Text Request
Machine learning techniques have been widely used in many practical applications such as computer version, natural language processing, bioinformatics, medical image analysis, intrusion detection, to name just a few. As one of the most important machine learning techniques, kernel methods provide a powerful and unified learning framework,which allows researchers to focus on developing efficient learning algorithms, regardless of specific data types such as string, vector, text, graph, etc. Due to the above merits, kernel methods have been intensively applied into different learning tasks, including classification, regression, clustering, ranking and so on. As well known, the performance of kernel methods is critically dependent on the choice of kernels and their parameters.However, how to choose an appropriate kernel and the corresponding parameters is an open issue and lacks principled guidelines. Consequently, the research on kernel methods is worth exploring and of critical importance in applications. The work in this thesis aims to address the above issue by developing several effective kernel methods. Its main contributions are summarized as follows.(1) We propose to learn an optimal neighborhood kernel in multiple kernel learning(MKL) framework. This paper treats the pre-specified kernel as an extra variable and jointly learns it with the optimal neighborhood kernel and the structure parameters of SVMs. Two instantiations are demonstrated by modeling the pre-specified kernel as a common Gaussian kernel and a linear combination of a set of base kernels in the way of MKL. Also, we give the probabilistic interpretation for our approach and apply it to explain the existing kernel learning methods, providing another perspective for their commonness and differences. Comprehensive experimental results on thirteen UCI data sets and another two real world data sets show the superior classification performance of the proposed algorithms.(2) We propose an efficient and practical radius-margin based MKL algorithm. Inspired by the relationship between the radius of the Minimal Enclosing Ball(MEB) and the trace of total data scattering matrix, this paper proposes to incorporate the latter into MKL to improve the situation. In particular, in order to well justify the incorporation of radius information, we strictly comply with the radius-margin bound of Support Vector Machines(SVMs) and thus focus on the ?2-norm soft margin SVM classifier. Detailed theoretical analysis is conducted to show how the proposed approach effectively preserves the merits of incorporating the radius of MEB and how the resulting optimization is efficiently solved. Moreover, the proposed approach achieves the following advantages over its counterparts:(1)more robust in the presence of outliers or noisy training samples,(2) more computationally efficient by avoiding the quadratic optimization for computing the radius at each iteration, and(3) readily solvable by the existing off-the-shelf MKL packages. Comprehensive experiments are conducted and the results well demonstrate the effectiveness and efficiency of our approach.(3) We propose an efficient radius-incorporated MKL algorithm for Alzheimer’s disease prediction. In this paper, we propose an improved radius-incorporated MKL algorithm for AD prediction. First, we redesign the objective function by approximating the radius with its upper bound, a linear function of the base kernel combination weights. This approximation makes the resulting optimization problem become convex and globally solvable. Second, instead of using crossvalidation, we model the regularization parameter C of the SVMs classifier as an extra kernel combination weight and automatically learn it through MKL. Third, we theoretically show that our optimization problem can be reformulated into the same form of the commonly used Simple MKL and conveniently solved by the off-theshelf packages. We analyze the factors that contribute to the improved performance of the proposed radius-incorporated algorithm and apply it to discriminate AD, MCI converters and MCI non-converters on the benchmark ADNI data set. As experimentally demonstrated, our algorithm can better utilize the radius information and achieve higher prediction accuracy than the comparable MKL methods in the literature.(4) We propose two absent MKL algorithms. This paper proposes two absent MKL(AMKL) algorithms to address the situation where some channels are missing.In specific, we define a margin for each sample in its own relevant space, which corresponds to the observed channels of the sample. The proposed AMKL algorithms then maximize the minimum of all sample-based margins, and this leads to a difficult optimization problem. We first provide a two-step iterative algorithm to approximately solve this problem. After that, we show that this problem can be reformulated as a convex one by applying the representer theorem. This makes it readily be solved via existing convex optimization packages. Comprehensive experiments are conducted on five MKL benchmark data sets to compare the proposed algorithms with existing imputation-based methods. As observed, our algorithms achieve superior performance and the improvement is more significant with the increasing missing ratio.(5) We propose an sample-adaptive MKL algorithm. Existing MKL algorithms indiscriminately apply a same set of kernel combination weights to all samples.However, the utility of base kernels could vary across samples and a base kernel useful for one sample could become noisy for another. In this case, rigidly applying a same set of kernel combination weights could adversely affect the learning performance. To improve this situation, we propose a sample-adaptive MKL algorithm, in which base kernels are allowed to be adaptively switched on/off with respect to each sample. We achieve this goal by assigning a latent binary variable to each base kernel when it is applied to a sample. The kernel combination weights and the latent variables are jointly optimized via margin maximization principle.As demonstrated on five benchmark data sets, the proposed algorithm consistently outperforms the comparable ones in the literature.(6) We propose a multiple kernel extreme learning machine framework. In this paper, we propose a general learning framework, termed multiple kernel extreme learning machines(MK-ELM), to address the issues of kernel selection and multiple heterogeneous data sources integration. Based on this framework, we propose three MK-ELM algorithms, including sparse, non-sparse and radius-incorporated variants. Three efficient optimization algorithms are proposed to solve the corresponding kernel learning problems. As the experimental results indicate, our proposed algorithms can achieve comparable or even better classification performance than state-of-the-art MKL algorithms, while incurring much less computational cost.(7) We propose a similarity and structure preservation framework for feature selection. In this paper, we observe that, besides preserving pairwise sample similarity, the local geometric structure of data is also important for feature selection and that these two factors play different roles in different feature selection scenarios.Following this idea, we propose a global and local similarity preservation framework(GLSPFS), which unifies supervised, unsupervised and semi-supervised feature selection into a common formulation. This framework allows us to systematically analyze the effect of global pairwise similarity and local geometric structure preservation on feature selection. Under this framework, we instantiate three specific feature selection algorithms by using local linear embedding(LLE), locality preserving projection(LPP) and local tangent space alignment(LTSA) to capture the local geometry of data, respectively. An efficient optimization algorithm with proved global convergence is developed to solve the resulting optimization problem. Extensive experimental study is conducted to compare our methods with many state-of-the-art feature selection algorithms in different learning scenarios. We observe that:(1) Our methods consistently achieve statistically significant improvement on selection performance.(2) In both supervised and semi-supervised feature selection, global pairwise similarity preservation shows more importance than local geometric structure preservation.(3) In unsupervised feature selection, preserving local geometric structure becomes more critical. In sum, our work not only demonstrates the advantages of the proposed GLSPFS framework, but also provides more insight into what should be preserved in different feature selection scenarios.
Keywords/Search Tags:optimal neighborhood kernel, multiple kernel learning, margin maximization, minimum enclosing ball, absent data learning, latent multiple kernel learning, similarity preservation, feature selection, extreme learning machine
PDF Full Text Request
Related items