Font Size: a A A

High-dimensional Data Clustering Method Based On Embedded Subspace

Posted on:2022-12-08Degree:MasterType:Thesis
Country:ChinaCandidate:R Y LiFull Text:PDF
GTID:2568306500950279Subject:Computer Science and Technology
Abstract/Summary:PDF Full Text Request
Most data sets in real life are often high-dimensional and may contain redundant features or noise.It is difficult to obtain the best clustering results by directly processing such data sets.This paper considers that the low-dimensional feature subspace of the data is always embedded in the high-dimensional feature space,which can better reveal the spatial distribution of the data and reflect the relationship between the data.In this paper,two clustering methods based on embedded subspace are proposed,and manifold learning is added to the embedded subspace at the same time.In the solution process,graph constraints are performed on the learned manifold to complete the clustering.Most data clustering methods related to non-negative matrix factorization directly decompose the original feature space,ignoring that the low-dimensional subspace is always embedded in the high-dimensional feature space,which is more meaningful for data representation.The high-dimensional data clustering method based on matrix factorization unifies the clustering and dimensionality reduction goals into one framework.In the framework,the clustering algorithm is executed on the embedded subspace,which can provide a more accurate and reasonable explanation.In addition,the algorithm uses l2,1-norm instead of traditional l2-norm to enhance the clustering results,making the model less sensitive to outliers in the data.At the same time,in order to maintain the local relevance of the data as much as possible,the algorithm also reconstructs an affinity matrix for learning,and introduces manifold learning into the clustering indicator matrix.The K multi-means clustering algorithm extends the K-means clustering algorithm.The K-means clustering algorithm only uses one center to model each type of data,but the assumption about the shape of the cluster makes it difficult to capture data types with non-convex distribution.In addition,many clusters contain multiple subclusters.These subclusters cannot be represented by a single prototype.The K multi-means clustering algorithm divides the data set into multiple specified clusters.Different from the methods using agglomeration strategy,the K multi-mean clustering algorithm plans the multi-mean clustering problem as an optimization problem.The K multi-means clustering algorithm on the subspace dynamically learns a new affinity graph in the corresponding subspace according to the affinity graph of the original space to obtain the optimal affinity matrix that can reflect the data structure.The optimal graph learned in the subspace is constrained to divide it into the final k classes.This paper integrates the graph learning and clustering processes in the data subspace into one framework,and obtains the clustering results through iterative update.Nine data sets are used in the experiment.In the comparison experiment,the clustering ACC,NMI and Purity values of the two algorithms on most data sets are higher than other comparison algorithms,verifying the effectiveness of the two methods.The analysis and comparison of the experimental part also reflects the superior performance of the two algorithms.
Keywords/Search Tags:Clustering, Subspace, Manifold learning, Matrix factorization, Multiple means
PDF Full Text Request
Related items