Font Size: a A A

A Variance Minimization Criterion For Manifold Learning

Posted on:2013-02-05Degree:MasterType:Thesis
Country:ChinaCandidate:C Y ZhangFull Text:PDF
GTID:2218330371958955Subject:Computer application technology
Abstract/Summary:PDF Full Text Request
In many modern machine learning tasks, the features are usually very high-dimensional and the labels are expensive to obtain. Learning on a small sample of high-dimensional data usually leads to severe overfitting. There are two popular solutions for this problem. One is using active learning and semi-supervised learning to maximize the utility of the labels and exploit the value of unlabeled data. Another is manifold learning, which analyzes the intrinsic geometry of the data distribution. Its key point is that many high-dimensional real-world datasetslie on a low-dimensional manifold. The intrinsic dimension is usually much lower than the dimension of the ambient Euclidean space, so by working with the intrinsic geometry, we can effectively avoid the curse of dimensionality.In this thesis, we develop a general framework to unify the two solutions. We explicitly take the intrinsic manifold structure into consideration, and seek the most stable learning process with a criterion of variance minimization. Specifically, We will work with Laplacian regularized semi-supervised learning, which is one of the most popular semi-supervised learning paradigms with the cluster-assumption on the data. We then analyze the statistical property of the learned model, and propose a criterion to minimize its variance.With this criterion, we realize concrete feature selection and active learning algo-rithms. The feature selection algorithm effectively reduces the dimension of the data, while keeping the interpretability of the original features. Meanwhile, the active learning algo-rithm maximizes the value of labels. Both are designed with the variance minimization criterion, so the resultant model is stable and can resist overfitting. We also derive efficient approximation procedures for our otherwise NP-hard objective functions. Our algorithms are evaluated extensively on real-world datasets and outperform previous methods.
Keywords/Search Tags:Experimental Design, Active Learning, Semi-supervised Learning, Fea-ture Selection
PDF Full Text Request
Related items