Font Size: a A A

A new framework for semisupervised, multitask learning

Posted on:2010-05-17Degree:Ph.DType:Dissertation
University:University of Illinois at Urbana-ChampaignCandidate:Loeff, NicolasFull Text:PDF
GTID:1448390002475666Subject:Engineering
Abstract/Summary:
Labeling image collections is a tedious task, especially when multiple labels have to be chosen for each image. On the other hand, the explosion of Internet content has provided cheap access to almost unlimited amounts of data, albeit with a lower quality of annotations. In this dissertation we introduce a new framework that extends state-of-the-art models in word prediction to incorporate information from two sources: (1) Unlabeled examples: datapoints without annotations, but lying in a manifold of lower dimensionality than the input space. (2) Correlated labels: examples with multiple labels, some of which tend to co-occur. Both are common in image annotation tasks. To the best of our knowledge, this is the first semisupervised multitask model used in vision problems of these characteristics.;We start by introducing a new model for semisupervised learning based in a boosting formulation that, unlike most semisupervised approaches, is fast, efficient, and can scale to millions of unlabeled examples. It is flexible enough to handle the whole range of supervision, from fully supervised multiclass classification to unsupervised clustering. The framework is general and accepts many different function approximation techniques. It produces results at the state of the art on standard problems and also tackles problems that were unfeasible with previous approaches.;Next we present a novel max-margin framework for multitask learning that exploits natural correlation between labels in image collections, resulting in improved word prediction and good annotation completion. It forces classifiers to share features and thus finds a low-dimensional latent space with high discriminative power for correlated labels. In subsequent chapters, we integrate part of the manifold framework with the multitask formulation, to extend it to semisupervised learning. Our max-margin formulation shares features between tasks and also propagates label information for learning from unlabeled examples. Our experiments show that unlabeled data makes a large contribution to performance of the classifier, especially when the number of labeled examples is low.;To conclude, we interpret the internal representation of the model and use it to perform unsupervised scene discovery. Defining a meaningful vocabulary for scene discovery is a challenging problem that has important consequences for object recognition. We consider scenes to depict correlated objects and present visual similarity. We postulate that the internal representation space of our model should allow us to discover a large number of scenes in unsupervised data; we show scene discrimination results on par with supervised approaches even without explicitly labeling scenes, producing highly plausible scene clusters.
Keywords/Search Tags:Framework, Semisupervised, Multitask, Labels, New, Image, Scene
Related items