Research On High Dimensional Data Clustering Algorithm Based On Deep Learning

Posted on:2020-06-08

Degree:Master

Type:Thesis

Country:China

Candidate:N Zhu

Full Text:PDF

GTID:2428330590452627

Subject:Computer technology

Abstract/Summary:

PDF Full Text Request

With the rapid development of information technology,the scale and dimensions of data are constantly growing,and the data presents high-dimensional characteristics.Clustering is the most commonly used method for data analysis.However,due to the large number of irrelevant attributes,sparse distribution and computational complexity in high-dimensional data,traditional clustering algorithms do not perform well on high-dimensional data.To solve the high-dimensional data clustering problem,subspace clustering method is an effective solution,which transforms the high-dimensional feature space into the low-dimensional feature space for clustering.It can be implemented by methods such as principal component analysis(PCA),sparse subspace clustering algorithm(SSC)and low rank representation algorithm(LRR).Although these subspace clustering has achieved good results,but these data representations learned through shallow models may not capture the complex potential structure of high-dimensional data;secondly,the entire data is used as a dictionary to learn features,and it is difficult to handle large-scale data sets.Deep learning is considered to be an effective means of solving these problems due to its excellent feature learning ability and rapid reasoning ability.In view of the above problems,we studied subspace clustering algorithm,autoencoder,and then proposed an Cascade Subspace Clustering Based on Local Structure Preservation(ICSC).ICSC algorithm provides a research idea for minimizing the difference of sample points distributed in two distance metric spaces,which provides a research idea for feature learning that requires the entire data set as a dictionary.In addition,ICSC uses a decoder to align feature data with the original data local structure and capture the underlying structure of high-dimensional data.We conducted related experiments on the ICSC algorithm,evaluated some of the parameters,and found a parameter that optimizes the performance of the algorithm.Finally,ICSC was compared with other clustering algorithms on multiple data sets,and three kinds of data were used.The commonly used clustering evaluation indicators are used to analyze the experimental results.The results show the effectiveness and superiority of the ICSC algorithm.

Keywords/Search Tags:

high-dimensional data, Subspace clustering, autoencoder, Local structure

PDF Full Text Request

Related items

1	Research On Clustering Algorithm Based On High Dimensional Data
2	On Robust And High-Dimensional Data Clustering
3	Subspace Clustering Based On Sparse Representation
4	Research On Subspace Clustering Algorithms For High-dimensional Data
5	Study On High-dimensional Data Subspace Clustering Analysis And Application
6	Research On Key Technologies Of Clustering High-dimensional Data Based On Sparse Subspace And Their Applications
7	Research On Subspace Clustering Algorithm For High Dimensional Data
8	Improvement Research Of Clustering Algorithm Based On High-dimensional Data
9	Research On Clustering Algorithm Based On Subspace In High-dimensional Data Streams
10	Research Of Subspace-clustering Algorithms Based On Density Over High-dimensional Data