New directions in semi-supervised learning

Posted on:2011-02-27

Degree:Ph.D

Type:Dissertation

University:The University of Wisconsin - Madison

Candidate:Goldberg, Andrew Brian

Full Text:PDF

GTID:1448390002455277

Subject:Computer Science

Abstract/Summary:

In many real-world learning scenarios, acquiring a large amount of labeled training data is expensive and time-consuming. Semi-supervised learning (SSL) is the machine learning paradigm concerned with utilizing unlabeled data to try to build better classifiers and regressors. Unlabeled data is a powerful resource, yet SSL can be difficult to apply in practice. The objective of this dissertation is to move the field toward more practical and robust SSL. This is accomplished by several key contributions.;First, we introduce the online (and active) semi-supervised learning setting, which considers large amounts of mostly unlabeled data arriving constantly over time. An online SSL classifier must be able to make efficient predictions at any moment and update itself in response to labeled and unlabeled data. Previously, almost all SSL assumed a fixed data set was available before training began, and receiving new data meant retraining a potentially slow model. We present two families of online semi-supervised learners that reformulate the popular manifold and cluster assumptions into theoretically motivated and efficient online learning algorithms.;We also invent several novel model assumptions and corresponding algorithms for the classic batch SSL setting. Principled in nature, these assumptions are geared toward making SSL easier to apply to a wider variety of situations in the real-world. Many SSL algorithms construct a graph over the data, to approximate an assumed (single) underlying low-dimensional manifold. In contrast, our novel multi-manifold assumption handles data lying on multiple manifolds that may differ in dimensionality, orientation, and density. The work also introduces a novel low-rank assumption, based on recent developments in matrix completion, that enables multi-label transduction with many unobserved features. Other contributions utilize several new forms of weak side information, such as dissimilarity relationships or order preferences over predictions. Finally, SSL is applied to sentiment or opinion analysis, exploring domain-specific assumptions and graphs to extend SSL to this challenging area of natural language processing.;The dissertation provides extensive experimental results demonstrating that these novel SSL learning settings and modeling assumptions lead to algorithms with significant performance benefits in computer vision, text classification, bioinformatics, and other prediction tasks.

Keywords/Search Tags:

SSL, Semi-supervised, Data, Assumptions, New, Algorithms

Related items

1	Research Of Semi-supervised Classification Learning Framework Based On Multi Assumptions
2	Research Of Reliable Semi-supervised Classification
3	The Research Of Mst Variant-based Semi-supervised Clustering Algorithms
4	Research On Semi-supervised Clustering And Classification Algorithm
5	Semi-supervised Clustering Algorithms For Streaming And Multidensity Data
6	Online Semi-Supervised Learning Theory,Algorithms And Applications
7	Graph Based Semi-supervised Learning Algorithms And Applications
8	Research On Semi-Supervised Clustering Algorithms Based On Rough Set
9	Semi-supervised Learning On Text Data
10	Research And Implementation Of Semi-supervised Machine Learning Algorithms For Classifying The Imbalanced Protocol Flows