Font Size: a A A

Research On Data Clustering

Posted on:2022-06-06Degree:DoctorType:Dissertation
Country:ChinaCandidate:Q L ZhaoFull Text:PDF
GTID:1488306338984609Subject:Computer application technology
Abstract/Summary:PDF Full Text Request
As times have evolved,the data dimensions used by machine learning algorithms as well as the number of samples have reached an unprecedented scale.The assumptions of traditional clustering algorithms are no longer valid,and their time complexity is unacceptable.To address these challenges,this paper investigates clustering algorithms from three perspectives:subspace clustering,multi-view clustering,and depth clustering.In this paper,we investigate the development status and representative algorithms in the three classes of algorithms,and we find four specific problems to study from them.First,to address the problem of interdependence of sub-tasks in subspace clustering algorithms,we propose to use pairwise constraints to break the interdependence of subspace finding and subspace sample assignment.And then,we find the subspace containing clusters more accurately.This leads to the design of a subspace clustering algorithm that simultaneously performs dimensionality selection and dimensionality weighting.Experimental results show that the proposed algorithm has performance advantages over existing algorithms.Secondly,we propose a multi-view clustering algorithm that adds weights to the clusters of each view to address the inconsistent cluster distribution within each view in multi-view clustering algorithms.By assigning more weight to clusters with higher differentiation within each view,we can better take advantage of the information between views and increase the accuracy of cluster assignment.The experimental results show that our proposed multi-view clustering algorithm improves the accuracy of clustering.Thirdly,we propose a multi-view clustering algorithm with incomplete data correspondence to address the phenomenon of missing intra-view representations in multi-view data.The algorithm can match the representations of incomplete instances using bipartite graphs,thus allowing multi-view clustering to handle a wider range of real-world data sets.Experimental results indicate that our algorithm can correctly handle multi-view data containing incomplete instances and to obtain accurate clustering results.Finally,we investigate the problem of early information fusion in deep neural networks and multimodal clustering and propose a deep multimodal clustering algorithm based on crossreconstruction.To fuse information from other modalities while extracting potential features of each modality using the self-encoder,we design two new types of self-encoder networks,global cross-reconstruction and local cross-reconstruction,and design two clustering algorithms based on the two networks and integrate them into one framework.Experimental results show that our proposed cross-reconstruction can exploit the information of different modalities and improve the clustering effect.The research carried out in this paper shows that scientific research methods such as discovering the intrinsic laws of data,designing algorithms rationally,and using innovative tools have improved the performance of several data clustering tasks,extended the range of applications,and made a small contribution to the development of unsupervised learning.
Keywords/Search Tags:High-dimensional Data, Subspace Clustering, Multi-view clustering, Deep Clustering
PDF Full Text Request
Related items