With the rapid development of science and technology,the popularity of mobile terminal equipment and the rise of social media,multimedia data have exploded in the past years。At the same time,it brings great challenges to machine learning,computer vision and data mining.On the one hand,high-dimensional data is emerging in more and more varied domains,which leads to the increasing cost of data storage,the complexity of the learning algorithm and the decline of generalization ability with the algorithm.On the other hand,the way of collecting data is various and the description of data is diverse.Data are usually represented by features from multiple views,e.g.,color,texture and shape.How to effectively and efficiently utilize the abundant information contained in multi-view data becomes the urgent problem to be solved.Subspace learning aims to map the data from a high-dimensional original feature space into a low-dimensional subspace and maintain some certain statistic characters at the same time,such that it can avoid "curse of dimension".But many classical subspace learning methods,such as matrix factorization,topic model,often ignore the inherent correlations among multiple features.So these methods cannot handle multi-view data effectively.On the other hand,as to single view high dimensional data,manifold regularized subspace learning could learn a compact representation and preserve the geomethrical information of the data.But the existing methods usually ignore the discriminative information hidden in the data.What’s more,they cannot guarantee the sparseness of the matrix factors.To address these issues,this paper focuses on subspace learning,including dimension reduction and semi-supervised/unsupervised multi-view learning two aspects,to study for the high dimensional data..Generally,the main contributions of this thesis are summarized as follows:(1)We proposed l2,1 norm and Hessian Regularized Non-negative Matrix Factorization with Discriminability(l2,1HNMFD)for data representation.On the one hand,we incorporated Hessian regularization into NMF,which has more favorable properties for unsupervised learning than Laplacianre gularization.On the other hand,l2,1HNMFD exploits l2,1 norm constraint to obtain sparse representation and uses approximation orthogonal constraint to characterize the discriminative information of the data.To solve the objective function,we develop an efficient optimization scheme to settle it.Extensive experimental results demonstrated that the proposed approach provides a better representation and achieves bettering clustering results..(2)We proposed Group Sparsity and Graph Regularized Semi-Nonnegative Matrix Factorization with Discriminability(GGSemi-NMFD)for data representation.GGSemi-NMFD adds graph regularization term in Semi-NMF,which could well preserve the local geometrical information of the data space.To obtain the discriminative information,approximation orthogonal constraints were added in the learned subspace.In addition,l2,1 norm constraints were adopted in the basic matrix,which could encourage basic matrix to be row-sparse.Experimental results in four data sets demonstrate the effectiveness of the proposed algorithms.(3)We proposed a graph regularized Multi-view Semantic subspace Learning algorithm(MvSL).To address the problem that previous semi-supervised/supervised multi-view learning methods impose implicit relationship constraints on encodings of labeled items,we combine non-negative matrix factorization(NMF)and graph embedding,imposing direct relationship constraints on the data items in the target subspace,to learn a compact representation for multi-view data and bridge low-level features and high-level semantics.What’s more,MvSL encourages each latent dimension to be associated with a subset of views via sparseness constraints.In this way,MvSL is able to capture flexible conceptual patterns hidden in multi-view features.Experiments on three real-world datasets demonstrate the effectiveness of MvSL.(4)We proposed a Dual regularized Multi-view Non-negative Matrix Factorization(DMvNMF)for data clustering.Previous unsupervised multi-view learning methods fail to exploit the merit of dual manifold regularization.To address this flaw,DMvNMF simultaneously exploits its geometric structure in the data space as well as its geometric structure in the feature space under the framework of NMF for multi-view data clustering.Our experimental results on three real-world datasets have demonstrated the effectiveness of DMvNMF algorithm for multi-view data clustering and it can significantly outperform other baseline methods. |