A Variance Minimization Criterion For Manifold Learning

Posted on:2013-02-05

Degree:Master

Type:Thesis

Country:China

Candidate:C Y Zhang

Full Text:PDF

GTID:2218330371958955

Subject:Computer application technology

Abstract/Summary:

PDF Full Text Request

In many modern machine learning tasks, the features are usually very high-dimensional and the labels are expensive to obtain. Learning on a small sample of high-dimensional data usually leads to severe overfitting. There are two popular solutions for this problem. One is using active learning and semi-supervised learning to maximize the utility of the labels and exploit the value of unlabeled data. Another is manifold learning, which analyzes the intrinsic geometry of the data distribution. Its key point is that many high-dimensional real-world datasetslie on a low-dimensional manifold. The intrinsic dimension is usually much lower than the dimension of the ambient Euclidean space, so by working with the intrinsic geometry, we can effectively avoid the curse of dimensionality.In this thesis, we develop a general framework to unify the two solutions. We explicitly take the intrinsic manifold structure into consideration, and seek the most stable learning process with a criterion of variance minimization. Specifically, We will work with Laplacian regularized semi-supervised learning, which is one of the most popular semi-supervised learning paradigms with the cluster-assumption on the data. We then analyze the statistical property of the learned model, and propose a criterion to minimize its variance.With this criterion, we realize concrete feature selection and active learning algo-rithms. The feature selection algorithm effectively reduces the dimension of the data, while keeping the interpretability of the original features. Meanwhile, the active learning algo-rithm maximizes the value of labels. Both are designed with the variance minimization criterion, so the resultant model is stable and can resist overfitting. We also derive efficient approximation procedures for our otherwise NP-hard objective functions. Our algorithms are evaluated extensively on real-world datasets and outperform previous methods.

Keywords/Search Tags:

Experimental Design, Active Learning, Semi-supervised Learning, Fea-ture Selection

PDF Full Text Request

Related items

1	Research On Partially Labeled Problem Based On Active Learning And Semi-supervised Mechanism
2	Research On The Application Of Semi-supervised Learning In Natural Language Processing
3	Research On Optimization Of Semi-supervised Classification Algorithm Combining With Active Learning
4	Semi-supervised Active Learning For Relevance Feedback In Image Retrieval
5	Semi-Supervised Learning With Multiple Views
6	Combining Semi-supervised And Active Learning For Image Retrieval
7	Semi-supervised Learning And Active Learning Of Sentiment Classification Coupled With Domain Knowledge
8	Research On Active Semi-supervised Learning Target Detection Algorithm
9	Research On The Application Of Geometric Information In The Semi-supervised Learning
10	Visual Data Classification Based On Active Learning And Semi-supervised Learning