Font Size: a A A

Research On Deep Embedding Algorithm For Cluster Analysis

Posted on:2022-04-11Degree:MasterType:Thesis
Country:ChinaCandidate:M ZhangFull Text:PDF
GTID:2518306527484314Subject:Control Science and Engineering
Abstract/Summary:PDF Full Text Request
As a mature data analysis technique in the field of data mining,cluster analysis has extensive researches in many fields,such as machine learning and pattern recognition.There are a large number of effective clustering algorithms at present,among which spectral clustering has attracted more and more attention due to its extensive application and expansibility.However,most of the existing spectral clustering algorithms have high computational complexity and large space overhead when processing massive data.And with the dimension of dataset increasing,the processing capacity of algorithms tends to decline,which leads to the unsatisfactory clustering result.In order to meet the demand for high-dimensional massive data analysis,this paper focuses on reducing space-time complexity for the improvement in efficiency,and applying dimensionality reduction technologies represented by the depth autoencoder for the improvement in accuracy.Besides,the improved algorithm is applied to community detection for large-scale complex networks.The main contributions are shown as follows:(1)Most subspace clustering algorithms based on spectral clustering framework are unable to capture the geometric structure of data effectively when mapping the high-dimensional data into a low-dimensional subspace.Aiming at solving this problem,a deep subspace clustering algorithm with low rank constrained prior(DSC-LRC)is proposed,maintaining both the global and local structure information.Low-rank representation(LRR)is combined with the depth autoencoder,global structures of data are captured by low rank constraint,and the potential characteristics of constrained neural network are represented as low rank.Data are nonlinearly mapped into a latent space by minimizing differences between reconstructions and inputs with the local features of the data maintained.With the introduction of the normalization layer,the division of clusters is predicted in the form of probability,and parameter updating and clustering optimization are carried out in an unsupervised joint learning framework.Experiments show that the proposed algorithm achieves good clustering performance in high dimensional datasets.(2)Most existing spectral clustering algorithms are faced with low clustering accuracy and costly similarity matrix storage.Aiming at these problems,a deep embedding clustering algorithm combined with metric fusion and landmark representation is proposed.First,instead of random sampling,the concept of relative mass is introduced to evaluate node quality.Based on this,the most representative nodes are selected as the landmark points and the graph similarity matrix is approximately obtained by sparse representation.Meanwhile,considering the geometric and topological distribution of the nearest neighbor samples,the Euclidean distance and Kendall Tau distance are integrated to measure the similarity between the landmarks and the other points,so as to increase the clustering precision.The eigendecomposition is replaced by a stacked autoencoder for the construction of a deep embedding model,and the obtained similarity matrix is taken as the model's input.The clustering accuracy is further improved by joint learning of embedded representation and clustering.Experiments show that the algorithm shows good clustering performance on large datasets.(3)In the context of large-scale complex networks,it is difficult to capture and preserve network structures by using existing embedding algorithms to obtain low-dimensional network structures.Based on the previously proposed low-dimensional representation and similarity matrix construction scheme,a deep embedding network model for community detection is proposed.The input matrix of the model combines adaptive similarity measurement and landmark representation to retain the global and local structure information while reducing memory consumption.In addition,inspired by the deep autoencoder and the non-negative matrix decomposition model,the single-layer NMF mapping is transformed into the multi-layer NMF mapping,which contains the decoder module and the encoder module,so as to learn the low-dimensional network structures.More accurate and efficient results of community detection can be realized.The validity of the proposed algorithm is verified by experiments on nine real-world networks.
Keywords/Search Tags:spectral clustering, low rank constraint, the depth autoencoder, metric fusion, landmark representation, non-negative matrix decomposition, community detection
PDF Full Text Request
Related items