Font Size: a A A

Research On Graph-based Large Scale Semi-supervised Learning Algorithms And Its Applications

Posted on:2018-08-23Degree:DoctorType:Dissertation
Country:ChinaCandidate:B LiuFull Text:PDF
GTID:1318330542477556Subject:Computer software and theory
Abstract/Summary:PDF Full Text Request
Semi-supervised learning is an idea that tries to learn the decision information from both labeled and unlabeled data.There are many concrete semi-supervised learning algorithms,and the graph-based one is an example attracting lots of focus in recent years.Although the basic theories of semi-supervised learning tend to be completed,however,there are still some areas to be improved.For example,apply the idea of semi-supervised learning to other problems;how to construct a more robust graph based on a more accuracy similarity metric;and how to understand the semi-supervised learning deeply from the view of the generative model in practice.In this study,the basic methods were firstly summarized following by the solutions of the three aforementioned problems.The details are as follows,(1)Firstly,applying the idea of the graph to the matrix completion based multi-label learning.This dissertation presents a novel framework for multi-label learning based on regularized low-rank minimization of the joint matrix of the feature-by-item matrix and label-by-item matrix.It can explicitly explore the linear correlation among different columns of the matrix via low-rank minimization and automatically complete the missing labels via matrix completion.From the view of semi-supervised learning,the traditional multi-label learning approaches based on matrix completion was extended with more reasonable modeling of the intrinsic structures of data.In more detail,in order to leverage the scarcity of labeled examples,this dissertation adopts the manifold assumption,i.e.,instances lie in a small local neighborhood region should also share a similar set of labels.To this end,the graph Laplacian is calculated to constrain the process of matrix completion.The resulted nuclear norm minimization problem can be solved with a modified fixed-point continuation method which is guaranteed to converge to the global optimum.Experiments on both one synthetic and four real-world datasets have shown that our proposed method can achieve promising results.Although the fixed-point continuation method can solve our problem,it is very expensive for computation.To alleviate it,this dissertation switches to an alternative way,the Alternating Direction Method of Multipliers(ADMM)as the improved version for larger scale dataset optimization.The resulted nuclear norm minimization problem can be solved with ADMM which is guaranteed to converge to the global optimum.Moreover,it is more efficient than the optimization methods using the traditional model.Complexity analysis and experiments(on both one synthetic and four real-world datasets)have shown that our proposed method can achieve promising results on multi-label learning.(2)Among semi-supervised learning methods,graph-based methods have attracted a lot of focus,which is based on the local pair-wised distance measures between a pair of vertices.The pair-wised similarity measure induced on a graph naturally meets with the concept of the kernel,since kernel naturally defines a generalized pair-wised similarity between objects.However,a different kernel may occur different similarity matrix based on the same dataset,and as well as the from similarity matrix to graph.To this end,a novel Multiple Graph Multiple Kernel Learning framework was presented to define a new Reproducing Kernel Hilbert Space(RKHS).On the one hand,it incorporates the global geometric information of both labeled and unlabeled data and on the second-hand combines the RKHS affiliated with each base kernel function.To the best of our knowledge,this is the first functional framework for semi-supervised multiple kernel learning.Under this new RKHS,this dissertation employs multiple kernel learning algorithms to solve the related learning problems.Moreover,a further step of employing multiple kernel learning to select the graph structure was presented.Experiments on two real-world data sets indicate the promising result of our proposed approaches.(3)Multi-view learning is another method of semi-supervised learning.The practical scenarios that collected from different domains do not share a common feature dictionary while all the domains correspond to the same label space and thus can be semantically dependent.The above problem brings challenges for traditional multi-view learning and multi-task learning.To this end,this dissertation proposes a graphical framework of semantically dependent multi-view multi-task learning,which based on the Sparse Gaussian Conditional Random Field and Hilbert-Schmidt Independence Criterion(HSIC).In this graphical model,different data sources are generated by their representations in the latent space,where the dependence among latent representations occurs.Different from traditional literature,where the latent representations are in common,this dissertation introduces the HSIC to model the dependence between latent feature spaces.Furthermore,in order to utilize the structure in the label space,this dissertation adopts a Sparse Gaussian Conditional Random Field(SGCRF)to describe the dependence of the category space and the association between the latent feature space and the category space.This dissertation develops an efficient Variational Expectation-Maximization(VEM)framework for the model inference.For model evaluation,the experiments on both synthetic data and real-world data were conducted.Experimental results have indicated that the discussed setting is meaningful and the proposed framework can not only improve the classification accuracy for multi-category classification and multi-label classification,but also output the concept structure of the label space,and the dependence graph between features and the label set.
Keywords/Search Tags:Semi-supervised Learning, Manifold Assumption, Matrix Completion, Multiple Kernel Learning, Multi-view Learning
PDF Full Text Request
Related items