Font Size: a A A

Semi-supervised Learning via Generalized Maximum Entropy

Posted on:2011-09-05Degree:Ph.DType:Thesis
University:New York UniversityCandidate:Erkan, Ayse NazFull Text:PDF
GTID:2448390002961426Subject:Engineering
Abstract/Summary:
The maximum entropy (MaxEnt) framework has been studied extensively in the supervised setting. Here, the goal is to find a distribution p that maximizes an entropy function while enforcing data constraints so that the expected values of some (pre-defined) features with respect to p match their empirical counterparts approximately. Using different entropy measures, different model spaces for p, and different approximation criteria for the data constraints, yields a family of discriminative supervised learning methods (e.g., logistic regression, conditional random fields, least squares and boosting) (Dudik & Schapire, 2006; Friedlander & Gupta, 2006; Altun & Smola, 2006). This framework is known as the generalized maximum entropy framework.;Semi-supervised learning (SSL) is a promising field that has increasingly attracted attention in the last decade. SSL algorithms utilize unlabeled data along with labeled data so as to increase the accuracy and robustness of inference algorithms. However, most SSL algorithms to date have had trade-offs, e.g., in terms of scalability or applicability to multi-categorical data.;In this thesis, we extend the generalized MaxEnt framework to develop a family of novel SSL algorithms using two different approaches: (1) Introducing Similarity Constraints We incorporate unlabeled data via modifications to the primal MaxEnt objective in terms of additional potential functions. A potential function stands for a closed proper convex function that can take the form of a constraint and/or a penalty representing our structural assumptions on the data geometry. Specifically, we impose similarity constraints as additional penalties based on the semi-supervised smoothness assumption , i.e., we restrict the MaxEnt problem such that similar samples have similar model outputs. The motivation is reminiscent of that of Laplacian SVM (Sindhwani et al., 2005) and manifold transductive neural networks (Karlen et al., 2008), however, instead of regularizing the loss function in the dual we integrate our constraints directly to the primal MaxEnt problem which has a more straight-forward and natural interpretation. (2) Augmenting Constraints on Model Features We incorporate unlabeled data to enhance the moment matching constraints of the generalized MaxEnt problem in the primal. We improve the estimates of the model and empirical expectations of the feature functions using our assumptions on the data geometry.;In particular, we derive the semi-supervised formulations for three specific instances of the generalized MaxEnt framework on conditional distributions, namely logistic regression and kernel logistic regression for multi-class problems, and conditional random fields for structured output prediction problems. A thorough empirical evaluation on standard data sets that are widely used in the literature demonstrates the validity and competitiveness of the proposed algorithms. In addition to these benchmark data sets, we apply our approach to two real-life problems, vision based robot grasping, and remote sensing image classification where the scarcity of the labeled training samples is the main bottleneck in the learning process. For the particular case of grasp learning, we also propose a combination of semi-supervised learning and active learning, another sub-field of machine learning that is focused on the scarcity of labeled samples, when the problem setup is suitable for incremental labeling.;To conclude, the novel SSL algorithms proposed in this thesis have numerous advantages over the existing semi-supervised algorithms as they yield convex, scalable, inherently multi-class loss functions that can be kernelized naturally.
Keywords/Search Tags:Semi-supervised, Entropy, Maximum, SSL algorithms, Generalized, Maxent, Data, Framework
Related items