Font Size: a A A

Exploitation of unlabeled data and related tasks in semi-supervised learning

Posted on:2008-06-23Degree:Ph.DType:Thesis
University:Duke UniversityCandidate:Liu, QiuhuaFull Text:PDF
GTID:2448390005973488Subject:Engineering
Abstract/Summary:
Supervised learning has proven an effective technique for learning a classifier when there is enough labeled data. Unfortunately, in many applications, a generous provision of labeled data is often not available due to the high cost of labeling a datum. Supervised algorithms are known to generalize poorly when there is a limited number of labeled data. There has been much recent work on semi-supervised learning and multitask learning; both try to improve the generalization of classifiers based on using information sources beyond the labeled data.;In this thesis, we design two semi-supervised algorithms, termed as parameterized neighborhood-based classification (PNBC) and label iteration, that efficiently explore the data manifold information provided by both the labeled data and unlabeled data, to improve generalization. The PNBC represents the probability of label at a given data point by mixing over all data points in a neighborhood, which is formed via a Markov random walk over the entire data manifold. The label iteration is a very simple algorithm, which has a closed-form solution in the limit. Experimental results demonstrate the effectiveness of both algorithms. Based on PNBC, we further propose an efficient active learning procedure for the unexploded ordnance (UXO) detection problem, employing the mutual-information criterion.;With PNBC as a building block, we make the first attempt to integrate the benefits offered both by semi-supervised learning and multitask learning (MTL), by proposing semi-supervised multitask learning. In the semi-supervised MTL setting, we have M partially labeled data manifolds, each defining a classification task and involving design of a PNBC classifier. The M PNBC classifiers are designed simultaneously within a unified sharing structure. The superior performance of semi-supervised MTL on real sensing applications demonstrates that both manifold information and the information from related tasks could play positive and complementary roles in real applications, suggesting that one can find significant benefits in practice by performing semi-supervised MTL.
Keywords/Search Tags:Labeled data, Semi-supervised, PNBC
Related items