Font Size: a A A

Research On Clustering With Multi-view Data

Posted on:2021-06-08Degree:DoctorType:Dissertation
Country:ChinaCandidate:H WangFull Text:PDF
GTID:1488306473972359Subject:Computer Science and Technology
Abstract/Summary:PDF Full Text Request
In the current big data era,data objects can be usually represented from multiple views,which produces multi-view data.Multi-view data bring new challenges and opportunities to traditional machine learning approaches.How to mine and exploit multi-view data has drawn a lot of attention in recent years.Clustering analysis is a key technology to find the data intrinsic structures.Clustering with multi-view data is known as multi-view clustering,which aims to exploit multi-view data to build a more accurate and robust clustering model.This dissertation concerns clustering with multi-view data and studies multi-view clustering based on graph model,spectral graph theory,matrix factorization and parallel computing technologies.The main research works and contributions are summarized as follows:(1)Adaptive-neighbor Graph Learning for Multi-view ClusteringExisting multi-view graph clustering works do not give any analytics on the generalization of data initial graphs and the universality of graph-based clustering models.To address such two research gaps,this dissertation designs a general multi-view graph clustering framework.Given the designed framework,the dissertation studies the impact of different graph metrics on the clustering results.Meanwhile,it discusses the relationship between the rank of the graph Laplacian matrix and the number of clusters,and also presents a self-weighted strategy.It then proposes a self-weighted adaptive-neighbor graph learning approach for multi-view clustering.Experimental results on multiple real-world datasets show that the performance of multi-view graph clustering relies on the data initial graphs,and also evaluate the effectiveness of the proposed approach.(2)Joint Graph Learning for Multi-view ClusteringMost graph-based multi-view clustering methods do not give sufficient consideration to the weights of different views and require an additional clustering step to produce the final clusters.They also usually optimize their objectives based on fixed graph matrices of all views.To handle these problems,following the previous adaptive-neighbor graph learning approach,this dissertation further proposes a joint graph learning method for multi-view clustering.The proposed method takes the data graph matrices of all views and automatically fuses them to generate a unified graph matrix.The unified graph matrix in turn improves the data graph matrix of each view,and also gives the final clusters directly without any additional manual tuning parameters.Experimental results using both toy and real-world datasets show that the proposed method outperforms state-of-the-art baselines markedly.(3)Spectral Perturbation Meets Incomplete Multi-view ClusteringExisting multi-view clustering algorithms mostly assume that each data instance should be sampled in all views.During the data collection,some data instances are missing in certain views,which lead to incomplete multi-view data.To deal with the challenges from incomplete multi-view data,this dissertation builds a strong link between perturbation risk bounds and incomplete multi-view clustering.It then proposes a perturbation-oriented incomplete multiview clustering method.Theoretical results shows that the minimization of perturbation risk bounds among different views maximizes the final fusion result across all views.This provides a solid fusion criterion for multi-view data.Experimental results using incomplete multi-view datasets show the superiority of the proposed method.(4)Multi-view Clustering in Parallel ComputingThe intrinsic multiple views of multi-view data lead to multi-view clustering algorithms often performing high computational cost.To break the limitation of traditional multi-view clustering algorithms in computation,this dissertation studies parallel multi-view clustering in distributed computing.It first explores an advanced matrix factorization technology known as concept factorization.After this,it proposes a novel multi-view clustering method,called multi-view concept clustering.The proposed method exploits manifold learning to address the original geometric structure of the data.An alternating iterative optimization algorithm based on multiplicative principle is presented to optimize the formulated objective function.As the optimization of each step is in independent,it then proposes a distributed parallel computing scheme for multi-view concept clustering.Experimental results show the effectiveness and efficiency of the proposed method.
Keywords/Search Tags:Multi-view Clustering, Graph Clustering, Spectral Clustering, Concept Factorization, Parallel Computing
PDF Full Text Request
Related items