Font Size: a A A

Semi-Supervised Learning Based On Feature Space Transformation

Posted on:2010-08-13Degree:MasterType:Thesis
Country:ChinaCandidate:W ZhangFull Text:PDF
GTID:2178360302459792Subject:Pattern Recognition and Intelligent Systems
Abstract/Summary:PDF Full Text Request
The goal of semi-supervised learning is to obtain a good learning machine utilizing a few of labeled data and a lot of unlabeled data. Co-training is one of the most algorithms in the filed of semi-supervised learning. It follows the procedure like this: two classifiers are trained on the two feature sets of a few of labeled data and then each classifier classifies the data in unlabeled data and choose p positive and n negative data who have the highest confidence as the new labeled data to add into the training data set of another classifier. Thus, each classifier can refine in the procedure. According to the theoretical analysis, this procedure could make a classifier be stronger.However, algorithms are feasible when certain assumptions are satisfied. The first assumption is that the features in either view are conditionally independent of the features in the other view, given the class of sample. The second assumption is that the quality of the two views is sufficiently high for classification. However, most data sets do not satisfy these two strong assumptions. Thus, we proposed a new feature set division algorithm. Its general idea is to project the original data set into a subspace in which all features are orthogonal to each other, then apply a greedy two-view feature selection strategy or energy-difference driven strategy on the subspace data set to gain two high quality views. For measuring the quality of each view, we introduce an energy function of view based on the eigenvalues corresponding to the features in this view. Experiment validates the effectiveness of the algorithm. Moreover, we also proposed a co-training based regression algorithm.(SSRFT), in which we utilized two extremely different regressors to meet the constraint in the co-training regression. Meanwhile, SSRFT was applied to the web document classification. Experiment also validate the effectiveness of SSRFT...
Keywords/Search Tags:semi-supervised learning, co-training, feature set division, K-L transform, energy function, web document classification
PDF Full Text Request
Related items