Font Size: a A A

Study On High-dimensional Data Subspace Clustering Analysis And Application

Posted on:2020-09-23Degree:DoctorType:Dissertation
Country:ChinaCandidate:H Z ChenFull Text:PDF
GTID:1368330602450281Subject:Applied Mathematics
Abstract/Summary:PDF Full Text Request
Clustering is an important means of data analysis.Through clustering analysis,data dis-tribution characteristics hidden in the data set can be found effectively,thus laying a good foundation for further full and effective use of data.With the rapid development of infor-mation technology,clustering is faced with not only the problem of increasing amount of data,but also the problem of high dimension of data.However,due to the "dimension-al effect",many clustering methods that perform well in low-dimensional data space often fail to achieve good clustering results in high-dimensional space,which presents a great challenge to the clustering analysis technology of high-dimensional data.High-dimensional data clustering is the key and difficult point of cluster analysis technique.Subspace clus-tering method based on spectral clustering is the effective way to achieve high dimensional data clustering.The purpose of subspace clustering is to segment high-dimensional data from different subspaces into essentially lower dimensional subspaces.It is a new method of high-dimensional data clustering and has wide applications in machine learning,comput-er vision,image processing,system identification and other fields.In this dissertation,some new clustering models are proposed to solve the problem of high-dimensional data subspace clustering.The main work includes the following aspects:1.Based on the analysis of the relationship between self-representation coefficient matrix and cluster indicator matrix,we propose a new unified minimization framework for affinity learning and subspace clustering——Structured Sparse Subspace Clustering with Direction-Grouping-Effect-Within-Cluster(SSDG).In SSDG,we define a concept of direction-grouping-effect-within-cluster(DG)to group data from the same subspace together.Based on DG,we design a new regularization term coupling the self-representation coefficient matrix and the indicator matrix.The new regularization term interactively enforces both to have the expect-ed properties:the indicator matrix enforces the self-representation coefficient vectors to have large cosine similarity,or DG,whenever the data points are drawn from the same subspace and they have the same cluster labels.On the other hand,the self-representation coeffi-cient matrix enforces data to have the same cluster labels whenever their self-representation coefficient vectors have large cosine similarity.Incorporating the new penalty into the Struc-tured Sparse Subspace Clustering model which only the structured sparseness property of the affinity matrix is considered,we present a new unified minimization framework——SSDG.SSDG considers not only structured sparseness but also DG of the affinity matrix.Experi-mental results on several commonly used datasets demonstrate that our method outperforms other state-of-the-art methods in revealing the subspace structure of high-dimensional data.2.Based on the analysis of the affinity matrix and labels should be discriminative and coher-ent,we give a new unified optimization framework for subspace clustering——Discriminative and Coherent Subspace Clustering(DCSC).In DCSC,we present a new regularity which combines the labels and the affinity to enforce the coherence of the affinity for data points from the same cluster and the discrimination of the labels for data points from different clusters.By combining the new label-guided regularity with the structure sparse regularity in the Structured Sparse Subspace Clustering which only enforces the discrimination of the affinity matrix for data points from different clusters and the coherence of labels for data points from the same cluster,we give a new unified optimization framework——DCSC.It enforces the coherence and discrimination of the affinity matrix as well as the labels,and can better recover the subspace structure underlying high dimensional datasets.Extended exper-iments on commonly used datasets demonstrate that our method performs better than some two stage state-of-the-art methods and the unified Structured Sparse Subspace Clustering.3.The Sparse Spectral Clustering improves the traditional method by introducing a sparse regularization to enforce the latent affinity matrix to be cluster discriminative.However,it is a two-stage method that do not fully utilize the relationship between the affinity matrix and the labels of the data.The Structured Sparse Subspace Clustering combines the affinity matrix learning and the cluster indicator matrix inferring into one unified framework,thus outperforms the two-stage methods.Even though,it does not consider the sparsity of the latent affinity matrix.We present a new data adaptive sparse regularity to enforce the cluster discrimination property of the latent affinity matrix,so that the intrinsic correlative relation-ship among the data can be revealed thus overcomes the blindness of the sparsity penalty in the Sparse Spectral Clustering.By combining the new regularity with the SSSC mod-el,we present a new unified optimization model,called Discrimination Enhanced Spectral Clustering(DESC).The DESC model has enhanced cluster discrimination,thus has better cluster performance.Extended experiments on commonly used datasets demonstrate that our method performs better than state-of-the-art two-stage methods and the unified method SSSC in revealing the subspace structure.4.The Sparse Spectral Clustering approximate the K-block latent affinity diagonal matrix by using sparse structure priors,which is an indirect method.And the Sparse Spectral Clustering is nonconvex and difficult to solve the cluster indicator matrix directly.For these problems,we propose the block diagonal matrix induced regularizer for directly pursuing the block diagonal latent affinity matrix.By combining the new regularity with the Spectral Clustering model,and using different conditions,we present two new models,they are all called Block Diagonal Spectral Clustering(BDSpeC).For each model,we give an effective algorithm to directly solve the cluster indicator matrix.Experiments on several real datasets demonstrate the effectiveness of our models.
Keywords/Search Tags:High-dimensional data clustering, Subspace clustering, Spectral clustering, Cluster indicator matrix, Affinity matrix
PDF Full Text Request
Related items