A new framework for semisupervised, multitask learning

Posted on:2010-05-17

Degree:Ph.D

Type:Dissertation

University:University of Illinois at Urbana-Champaign

Candidate:Loeff, Nicolas

Full Text:PDF

GTID:1448390002475666

Subject:Engineering

Abstract/Summary:

Labeling image collections is a tedious task, especially when multiple labels have to be chosen for each image. On the other hand, the explosion of Internet content has provided cheap access to almost unlimited amounts of data, albeit with a lower quality of annotations. In this dissertation we introduce a new framework that extends state-of-the-art models in word prediction to incorporate information from two sources: (1) Unlabeled examples: datapoints without annotations, but lying in a manifold of lower dimensionality than the input space. (2) Correlated labels: examples with multiple labels, some of which tend to co-occur. Both are common in image annotation tasks. To the best of our knowledge, this is the first semisupervised multitask model used in vision problems of these characteristics.;We start by introducing a new model for semisupervised learning based in a boosting formulation that, unlike most semisupervised approaches, is fast, efficient, and can scale to millions of unlabeled examples. It is flexible enough to handle the whole range of supervision, from fully supervised multiclass classification to unsupervised clustering. The framework is general and accepts many different function approximation techniques. It produces results at the state of the art on standard problems and also tackles problems that were unfeasible with previous approaches.;Next we present a novel max-margin framework for multitask learning that exploits natural correlation between labels in image collections, resulting in improved word prediction and good annotation completion. It forces classifiers to share features and thus finds a low-dimensional latent space with high discriminative power for correlated labels. In subsequent chapters, we integrate part of the manifold framework with the multitask formulation, to extend it to semisupervised learning. Our max-margin formulation shares features between tasks and also propagates label information for learning from unlabeled examples. Our experiments show that unlabeled data makes a large contribution to performance of the classifier, especially when the number of labeled examples is low.;To conclude, we interpret the internal representation of the model and use it to perform unsupervised scene discovery. Defining a meaningful vocabulary for scene discovery is a challenging problem that has important consequences for object recognition. We consider scenes to depict correlated objects and present visual similarity. We postulate that the internal representation space of our model should allow us to discover a large number of scenes in unsupervised data; we show scene discrimination results on par with supervised approaches even without explicitly labeling scenes, producing highly plausible scene clusters.

Keywords/Search Tags:

Framework, Semisupervised, Multitask, Labels, New, Image, Scene

Related items

1	Degradation Chinese Scene Text Recognition Algorithm Based On Multitask Learning Mechanism
2	Indoor Scene Semantic Segmentation Based On Color-Depth Image Information
3	Multi-Labels Text Classifier With Primary And Secondary Labels
4	Label Assignment In GMPLS-Controlled Multiservice Transport Network
5	Novel generative semisupervised learning based on fine-grained component-conditional class labelin
6	Research And Implementation Of Multitask Screen Recording Software For Mac OS X
7	Methods and Experimental Design for Collecting Emotional Labels Using Crowdsourcin
8	Research On Compressive Signal Detection And Multitask Recovery Algorithms
9	Learning from imperfect & related labels
10	Scene Recognition Based On Deep Neural Network