Font Size: a A A

Several Research Topics On Learning With Side-Information

Posted on:2006-10-19Degree:MasterType:Thesis
Country:ChinaCandidate:Y L ZhouFull Text:PDF
GTID:2178360182483512Subject:Control Science and Engineering
Abstract/Summary:PDF Full Text Request
This thesis addresses several research topics on Learning with Side-Information.Different from the term used in the field of communication, side-information hererepresents some instance-level constraints, indicating whether two samples originatefrom the same but unknown category (positive constraints) or from two differentcategories (negative constraints). Obviously, side-information is a kind of supervisioninformation that is more general than sample labels. Hence, it is of theoretical andpractical significance to design learning algorithms based on side-information. First,for unsupervised learning algorithms such as clustering algorithms, introducingside-information can lead to more meaningful results. Secondly, sample pairs containuseful information about sample variations and we can make use of sample pairs todesign classifiers. The work in this thesis includes the following aspects:First, we present a general framework for relevant feature extraction using bothside-information and unlabeled data, and propose the Iterative Self-EnhancedRelevant Component Analysis (ISERCA) under this framework. The ISERCA can beregarded as a type of semi-supervised algorithm in the sample pair space. In the firststep, we make use of the Gaussian Mixture Models and boosting methods to design arobust semi-supervised classifier in the sample pair space, thus obtaining moreside-information. In the second step, better features are extracted using the originaland boosted side-information, and the samples are transformed into the new featurespace. Iterate the two steps until some target is hit. Our experimental results show thatthe ISERCA is very robust and efficient.Secondly, we present the idea of designing invariant manifold distance usingsample pairs, and propose the Tunable Nearest Feature Line Classifier (TNFL). TheTNFL overcomes some flaws of the Nearest Feature Line Classifier (NFL), which isdesigned using sample pairs. The experimental results over some UCI datasets andthe AR face database substantiate the strength of TNFL.Thirdly, we propose the Structure-Shared Local Manifold Classifier(SSLMC), in which the sample pairs are used to estimate the local tangentsubspaces in a probabilistic way. To overcome the bad effects caused by smallsample size and noisy data, we present the assumption of "sharing structures"for local sub-manifolds. This assumption has proved reasonable over manyreal datasets. And SSLMC even achieves better performance than SupportVector Machine (SVM) does over some datasets. We believe the concept of"sharing structures" can be introduced to other manifold-based algorithms.
Keywords/Search Tags:Side-Information, instance-level constraints, pair-wise learning, invariant distance measure, structure-shared manifold classifier
PDF Full Text Request
Related items