Font Size: a A A

Study On Large-scale Multi-label Learning

Posted on:2019-02-04Degree:MasterType:Thesis
Country:ChinaCandidate:W J ZhangFull Text:PDF
GTID:2428330566460777Subject:Software engineering
Abstract/Summary:PDF Full Text Request
Large-scale multi-label learning(LMLL)aims to learn classifiers that can automat-ically annotate a data point with the most relevant subset of labels from an extremely large label set.It has been widely used in many applications such as tagging,ranking,recommendation and etc.Hence,large-scale multi-label learning has drawn considerable attention for its practical importance.The main challenge is that both the data feature space and the label space have extremely high dimensionalities and sparsities.It involves 2L possible label sets especially when the label dimension L is huge,e.g.,in millions for Wikipedia labels.This paper proposes two novel methods which simultaneously exploits semantical label correlations and establishs nonlinear feature embedding.Experimental results on several benchmark datasets demonstrate the effectiveness and efficiency of our methods.Main contributions of this paper are as follows:? This paper presents an efficient large-scale multi-label learning method(CoMFM).The method consists of two innovations:?)We present a novel collaborative label embedding algorithm of exploiting semantical label correlations by using collabo-rative filtering techniques on the label co-occurrence matrix,instead of the training label matrix,and then obtains the low-dimensional latent representations for all la-bels.?)To the best of our knowledge,this is first work that combining high-oder feature correlations and label correlations simultaneously for LMLL.Specially,for learning high-order nonlinear feature embeddings,we extend vanilla factorization machine to multi-output fashion.? This paper presents a deep learning based large-scale multi-label learning method(DXML).The method also consists of two innovations:?)We present a novel deep label graph embedding algorithm to learn the low-dimensional representations for all labels,to the best of our knowledge,this is the first work to introduce explicit label graph structure into the LMLL.?)We present a nonlinear feature embedding by using deep neural network,this is an early work for adapting deep learning to the LMLL setting.? Experimental results on several benchmark datasets confirm that:?)CoMFM per-forms competitively against state-of-the-art with less computation costs,surpris-ingly 10-120x faster than the recent embedding-based methods with similar accu-racy;ii)DXML outperforms all embedding-based methods.
Keywords/Search Tags:Large-scale Multi-label Learning, Deep Learning, Label Graph, Multioutput Factorization Machine, Collaborative Filtering
PDF Full Text Request
Related items