Font Size: a A A

Subspace Clustering Method Of High Dimensional Data

Posted on:2021-03-24Degree:MasterType:Thesis
Country:ChinaCandidate:L J ZhengFull Text:PDF
GTID:2428330605473026Subject:Computer Science and Technology
Abstract/Summary:PDF Full Text Request
Due to the influence of dimension disaster.the efficiency of clustering and the accuracy of high-dimensional data clustering results can hardly be guaranteed.In order to reduce the influence of dimensional disaster,the subspace clustering algorithm is adopted to generate the subspace of the high-dimensional data set,and the result obtained from clustering in the subspace is regarded as the basis of data analysis.In this process,the quality of the subspace is the key to ensure the effectiveness of the subspace clustering algorithm.There are two methods to improve the quality of the subspace: one is to formulate effective generation rules during the generation of the subspace;the other is to simplify the subspace after the generation of the subspace according to the corresponding screening strategy.In this paper,the two methods mentioned above are adopted simultaneously.Firstly,the subspace is generated by the dimension with high dimension density in the high-dimensional data.Secondly,adaptive meshes are generated in the subspace,and the data in the subspace is simplified according to the mesh density.Then,according to the dimension density,the low-density dimension in the subspace is pruned again to improve the quality of the subspace.In the process of clustering,mesh clustering is adopted to cluster subspaces according to the adjacency of mesh.Experimental results show that the algorithm can obtain good experimental results on UCI data set,and can produce good experimental results in anti-noise ability,scalability and efficiency experiments.Aiming at the case of uncertain data in high-dimensional data,in order to avoid the influence of uncertain data on high-dimensional data clustering results,a subspace clustering algorithm for high-dimensional uncertain data is proposed,and corresponding solutions are proposed for the case of dimensional uncertainty and value uncertainty in high dimensional uncertain data respectively.The uncertain dataare firstly determined and then clustered.In order to improve the efficiency of the algorithm,the corresponding deterministic method is adopted for different types of uncertain data.In view of the uncertain value,KNN algorithm is used to find the knearest neighbor data of the uncertain data,and then the deterministic representation of the uncertain data is obtained.According to the dimension similarity of the uncertain dimension in the data set,the definite dimension of the data set is obtained.Clique algorithm is adopted to cluster the determined high-dimensional data effectively.Experimental results show that the algorithm can obtain good experimental results on UCI data set,and the algorithm can effectively cluster highdimensional uncertain data and produce high-quality clustering results.It shows some robustness and anti-noise ability in different kinds of high-dimensional uncertain data sets.
Keywords/Search Tags:High dimension, Subspace, Uncertain data, K Nearest Neighbor query
PDF Full Text Request
Related items