Font Size: a A A

Online Semi-Supervised Learning Theory,Algorithms And Applications

Posted on:2015-05-08Degree:DoctorType:Dissertation
Country:ChinaCandidate:B L SunFull Text:PDF
GTID:1108330479979574Subject:Control Science and Engineering
Abstract/Summary:PDF Full Text Request
Online semi-supervised learning(OS2L) is a relatively new-subfield of machine leaning which has been used in pattern recognition, data mining and information retrieval since the beginning of the twenty-first century. It is a combination of traditional semi-supervised learning and online learning, which concerns using the labeled examples and unlabeled examples in an online manner. OS2 L algorithms take place in a sequence of consecutive learning rounds. On each round, the learner is given a training example and is required to predict the label if the example is unlabeled. During the OS2 L process, the learner may update the predictors so as to be more accurate in the later rounds. In the age of big data, scholars and engineers pay more and more attentions on the ability of OS2 L to reduce the high computational overhead and update the model fast. Thus studies on OS2 L are important practically and theoretically.Some OS2 L algorithms have been proposed in recent years. However, all these methods which are designed by researcher’s intuition and experience are simple extensions of traditional analysis tools in OS2 L problems, and thus it is difficult to understand their intrinsic difference. The dissertation first builds up an online semi-supervised learning framework(OS2LF) which understands OS2 L and its extensions in a unified viewpoint and helps developing new OS2 L algorithms. Based on the OS2 LF, we carry out the further research on: online manifold regularization, online co-regularization, online semi-supervised support vector machines(online S3VMs) and online semi-supervised learning with multiple regularization terms. The main achievements and contributions of the dissertation are listed as follows:(1) A novel online semi-supervised learning framework based on dual ascending procedures is proposed. The basic descriptions and assumptions of the OS2 L problem are presented based on the notion of convex optimization. Using Fenchel conjugates, we analyze the OS2 L process in the dual function. Furthermore, the optimal predictor can be obtained by maximizing the dual function and the OS2 L algorithms can be seen as the ascending procedures of the dual function in different learning rounds. We demonstrate the upper loss bound of the OS2 L algorithms and analyze the elements qualitatively. This work paves a way to the design and analysis of OS2 L algorithms.(2) A novel online manifold regularization algorithmic framework is proposed. Manifold regularization is a geometric framework for learning from examples. This idea of regularization exploits the geometry of the probability distribution that generates the data and incorporates it as an additional regularization term. The dual function of manifold regularization can be described by a new independent coefficient vector groups. Different derived online manifold regularization algorithms based on gradient ascent are essentially different dual ascending procedures. The previous online manifold regularization algorithm can be seen as a special case of our framework. For practical purpose, two buffering strategies and two sparse approximations are proposed to reduce the computational complexity. Detailed experiments verify the utility of our approaches. Another important conclusion is that our online manifold regularization algorithms can handle the settings where the target hypothesis is not fixed but drifts with the sequence of examples.(3) A novel online co-regularization algorithmic framework is proposed. Co-regularization is a method of improving the generalization accuracy by using unlabeled data in different views. Multiple hypotheses are trained in co-regularization framework, and are required to make similar predictions on any given unlabeled example. Hinge-loss functions and tolerance functions are used in the co-regularization problems. We extend the definition of Fenchel conjugate to multi-variable functions for solving online multiview semi-supervised learning problem. The existing online co-regularization algorithms in previous work can be viewed as approximations of our dual ascending process using gradient ascent. New algorithms are derived based on the idea of ascending the dual function more aggressively. For practical purpose, we also propose two multiview sparse approximation approaches for kernel representation to reduce the computational complexity. Experiments show that the aggressive online co-regularization algorithms achieve a better accuracy and stability.(4) A novel online semi-supervised support vector machines(online S3VMs) algorithmic framework is proposed. S3 VMs learn a large margin hyper-plane classifier using labeled training data like SVMs, but simultaneously force this hyper-plane to be far away from the unlabeled data. If you believe there is a “gap” or low density region between the underlying distributions of the two classes, then S3 VMs can help because it selects a rule with exactly those properties. Based on the notion of CCCP, we transfer the objective function of S3 VMs from noncovex to convex. The online S3 VMs algorithms can also be analyzed in the dual function. The imbalance penalization function is proposed to penalize the imbalance of the classification during the learning process. Two online S3 VMs algorithms are derived by updating limited dual coefficients:(1) aggressive dual ascending;(2) local concave-convex procedure(LCCCP). We draw the connections to earlier analysis techniques. Experiments show that our S3 VMs algorithms performs well on synthetic and real-world datasets.(5) A novel online semi-supervised learning framework with multiple regularization terms is proposed. Generally, multiple semi-supervised terms lead to major empirical improvements on real-world tasks. These approaches can be explained by regularization theory and capacity control of function classes. For simplicity and concreteness, we focus on semi-supervised learning problems based on manifold regularization and co-regularization. Using multi-variable Fenchel conjugate, the online semi-supervised learning algorithms with multiple regularization terms can also be designed by dual ascending procedures. Differently, the dual functions have much more coefficient variables to control its value. New algorithms achieve a gradient dual increment or a maximal dual increment on each learning round. Experiments show that multiple regularization terms can also improve the generalization accuracy of the OS2 L algorithms.
Keywords/Search Tags:Online Semi-supervised Learning, Online Manifold Regularization, Online Co-regularization, Online Semi-supervised Support Vector Machines, Online Semi-supervised Learning with Multiple Regularization Terms, Fenchel Conjugate, Dual Ascending Procedure
PDF Full Text Request
Related items