High-dimensional Data Clustering Method Based On Embedded Subspace

Posted on:2022-12-08

Degree:Master

Type:Thesis

Country:China

Candidate:R Y Li

Full Text:PDF

GTID:2568306500950279

Subject:Computer Science and Technology

Abstract/Summary:

PDF Full Text Request

Most data sets in real life are often high-dimensional and may contain redundant features or noise.It is difficult to obtain the best clustering results by directly processing such data sets.This paper considers that the low-dimensional feature subspace of the data is always embedded in the high-dimensional feature space,which can better reveal the spatial distribution of the data and reflect the relationship between the data.In this paper,two clustering methods based on embedded subspace are proposed,and manifold learning is added to the embedded subspace at the same time.In the solution process,graph constraints are performed on the learned manifold to complete the clustering.Most data clustering methods related to non-negative matrix factorization directly decompose the original feature space,ignoring that the low-dimensional subspace is always embedded in the high-dimensional feature space,which is more meaningful for data representation.The high-dimensional data clustering method based on matrix factorization unifies the clustering and dimensionality reduction goals into one framework.In the framework,the clustering algorithm is executed on the embedded subspace,which can provide a more accurate and reasonable explanation.In addition,the algorithm uses l_2,1-norm instead of traditional l₂-norm to enhance the clustering results,making the model less sensitive to outliers in the data.At the same time,in order to maintain the local relevance of the data as much as possible,the algorithm also reconstructs an affinity matrix for learning,and introduces manifold learning into the clustering indicator matrix.The K multi-means clustering algorithm extends the K-means clustering algorithm.The K-means clustering algorithm only uses one center to model each type of data,but the assumption about the shape of the cluster makes it difficult to capture data types with non-convex distribution.In addition,many clusters contain multiple subclusters.These subclusters cannot be represented by a single prototype.The K multi-means clustering algorithm divides the data set into multiple specified clusters.Different from the methods using agglomeration strategy,the K multi-mean clustering algorithm plans the multi-mean clustering problem as an optimization problem.The K multi-means clustering algorithm on the subspace dynamically learns a new affinity graph in the corresponding subspace according to the affinity graph of the original space to obtain the optimal affinity matrix that can reflect the data structure.The optimal graph learned in the subspace is constrained to divide it into the final k classes.This paper integrates the graph learning and clustering processes in the data subspace into one framework,and obtains the clustering results through iterative update.Nine data sets are used in the experiment.In the comparison experiment,the clustering ACC,NMI and Purity values of the two algorithms on most data sets are higher than other comparison algorithms,verifying the effectiveness of the two methods.The analysis and comparison of the experimental part also reflects the superior performance of the two algorithms.

Keywords/Search Tags:

Clustering, Subspace, Manifold learning, Matrix factorization, Multiple means

PDF Full Text Request

Related items

1	Research Of Multiple Clustering Algorithms Based On Matrix Factorization
2	Manifold Regularized Matrix Factorization With Constrains And Its Applications In Image Clustering
3	Nonnegative Matrix Factorization Algorithm Based On The Regularized Method And Its Applications
4	Research On Sparse Subspace Clustering And Its Fast Algorithms
5	Non-Negative Collective Matrix Factorization Algorithm For Heterogeneous Co-Transfer Clustering
6	Research On Subspace Learning Based Data Representation
7	Subspace Learning And Its Application To Image Set Classification
8	Multi-view Clustering Algorithm Based On Subspace Learning
9	Matrix Factorization And Its Application Research On Image Classification And Clustering
10	Non-negative Matrix Factorization With Manifold Regularization And Hardware Acceleration For Large-scale Datasets