Font Size: a A A

Research On High Dimensional Data Clustering Algorithm Based On Deep Learning

Posted on:2020-06-08Degree:MasterType:Thesis
Country:ChinaCandidate:N ZhuFull Text:PDF
GTID:2428330590452627Subject:Computer technology
Abstract/Summary:PDF Full Text Request
With the rapid development of information technology,the scale and dimensions of data are constantly growing,and the data presents high-dimensional characteristics.Clustering is the most commonly used method for data analysis.However,due to the large number of irrelevant attributes,sparse distribution and computational complexity in high-dimensional data,traditional clustering algorithms do not perform well on high-dimensional data.To solve the high-dimensional data clustering problem,subspace clustering method is an effective solution,which transforms the high-dimensional feature space into the low-dimensional feature space for clustering.It can be implemented by methods such as principal component analysis(PCA),sparse subspace clustering algorithm(SSC)and low rank representation algorithm(LRR).Although these subspace clustering has achieved good results,but these data representations learned through shallow models may not capture the complex potential structure of high-dimensional data;secondly,the entire data is used as a dictionary to learn features,and it is difficult to handle large-scale data sets.Deep learning is considered to be an effective means of solving these problems due to its excellent feature learning ability and rapid reasoning ability.In view of the above problems,we studied subspace clustering algorithm,autoencoder,and then proposed an Cascade Subspace Clustering Based on Local Structure Preservation(ICSC).ICSC algorithm provides a research idea for minimizing the difference of sample points distributed in two distance metric spaces,which provides a research idea for feature learning that requires the entire data set as a dictionary.In addition,ICSC uses a decoder to align feature data with the original data local structure and capture the underlying structure of high-dimensional data.We conducted related experiments on the ICSC algorithm,evaluated some of the parameters,and found a parameter that optimizes the performance of the algorithm.Finally,ICSC was compared with other clustering algorithms on multiple data sets,and three kinds of data were used.The commonly used clustering evaluation indicators are used to analyze the experimental results.The results show the effectiveness and superiority of the ICSC algorithm.
Keywords/Search Tags:high-dimensional data, Subspace clustering, autoencoder, Local structure
PDF Full Text Request
Related items