Font Size: a A A

Research On Clustering Algorithm Based On High Dimensional Data

Posted on:2018-10-10Degree:MasterType:Thesis
Country:ChinaCandidate:D S ShiFull Text:PDF
GTID:2358330536956137Subject:Applied Mathematics
Abstract/Summary:PDF Full Text Request
In recent years,with the rapid development of Internet technology,the scale and dimension of data have increased dramatically,resulting in Curse of Dimensionality and density sparsity.High-dimensional data usually contain many redundant,irrelevant features and noise,to high-dimensional data clustering analysis has brought great challenges.It is found that the cluster structure of high-dimensional data usually exists in the subspace of the data rather than the whole data space.In order to deal with high-dimensional data,domestic and foreign researchers have proposed many subspace clustering algorithms.Among them,soft subspace clustering is an important research topic in subspace clustering algorithms.It assigns a weight to each feature of the sample and determines the subspace structure of the cluster by the larger weight.However,the single feature in high-dimensional data is weak,it is difficult to find cluster structure through a single weak feature,and it is not ideal for a single feature-weighted approach to handle tens of thousands of features.Many high-dimensional data sets are the integration results of different aspects of observation,so that the feature of different aspects can be grouped,and the importance of different feature groups in different clusters is also different.FG-k-means which combines the feature group introduces the two-level weighting of the feature group and the single feature to deal with the high-dimensional data,and achieved a good clustering effect.FG-k-means cannot achieve the automatic grouping of features,need to be based on the prior knowledge of the features of grouping,but for many high-dimensional datasets,we do not know the feature groups information.Aiming at these problems,this paper takes high-dimensional data as the research object,the main work includes the following two parts:(1)Latent feature group learning in subspace clustering(LFGL)is proposed.The previous method cannot be automatically grouped during the clustering process,and it is necessary to group the feature according to the prior knowledge.However,we do not know the feature groups information in many high-dimensional data.Aiming at these problems,we proposes the LFGL model,which first constructs a feature grouping model(FGM),thenembeds the feature grouping model into the subspace clustering algorithm and constructs an optimization problem.Finally,under the requirements of FGM model,solve the problem.Compared with the previous clustering method,LFGL not only realized the automatic grouping of features,but also obtained better clustering effect.(2)The dimensional simplification and clustering analysis based on Deep Denoising Sparse Autoencoders(DDSAE)is proposed.There are Curse of Dimensionality and density sparsity in high dimensional data.When the dimension increases,the performance of various clustering algorithms is obviously degraded,and super-high dimensional data is running even memory overflow.In this paper,we use the non-linear expression ability of Autoencoder,and introduce L2 norm to prevent over-fitting,add noise in the input data to improve the robustness of the model,and the use of cross-entropy as a loss function in Autoencoder,then Multiple Autoencoder are superimposed to form deep denoising sparse Autoencoders.DDSAE learns the essential features of low dimensional abstraction from high dimensional data,and then uses the LFGL model of the third chapter to perform clustering analysis with low dimension vector.Compared with the experimental results of PCA and LLE,it is found that this method has better performance in dimensionality reduction and clustering analysis of high dimensional data.In addition,by comparing the clustering results of DDSAE and the clustering results of LFGL,we find that the clustering effect of DDSAE is better than that of LFGL,which also shows the effectiveness of the method.
Keywords/Search Tags:Subspace clustering, High-dimensional data, AutoEncoder, Dimensional simplification
PDF Full Text Request
Related items