Clustering has been widely used in pattern recognition,text classification,information retrieval and biomedical fields because it can analyze and mine data effectively.With the rapid development of information technology and multimedia,the data generated in real world nowadays are often characterized by high dimensionality and complex structure,and traditional clustering algorithms cannot effectively deal with these high-dimensional data with complex structure.Subspace clustering algorithm assumes that the data belonging to the same cluster in high-dimensional data are distributed in a lowdimensional subspace,which has shown good results in dealing with the problem of clustering high-dimensional data.Deep subspace clustering,on the other hand,combines subspace clustering and deep learning,and achieves good results on the task of clustering data with nonlinear subspace distribution.However,the accuracy of existing deep subspace clustering algorithms still needs to be improved,which is limited by factors such as insufficient information mining.Meanwhile,most existing subspace clustering algorithms are often not applicable to the case of large data volume due to high computational complexity.And both large-scale data and multi-view data bring a large amount of data,which brings challenges to subspace clustering.In addition,multi-view data brings a large amount of data as well as more information,which can be fully utilized to obtain better clustering results,but most of the existing multi-view clustering algorithms fail to make full use of this information and are difficult to be extended to large-scale datasets.To address these issues,this thesis focuses on the following three aspects of subspace clustering:(1)To solve the problem of high-dimensional data clustering,a deep double selfexpressive subspace clustering algorithm(DSESC)based on auto-encoder is proposed.The algorithm is based on a self-expressive model,adding a fully-connected layer to the auto-encoder as a self-expressive layer.Then treat the self-expression coefficient as a representation of the sample and add a new self-expression layer to perform secondary self-expression,and finally construct a similarity matrix based on the two self-expression coefficient matrices and perform spectral clustering to obtain the final cluster result.At the same time,we design a self-supervised module to improve the performance of the algorithm by referring to the idea of contrastive learning.Through a series of experiments on four mainstream public datasets,the effectiveness of the DSESC algorithm is proved.(2)In order to effectively deal with multi-view data,a multi-view clustering algorithm based on multi-similarity(MVCCMS)is proposed.The algorithm utilizes the twodimensional information of samples and features of multi-view data for clustering.Firstly,the method of cluster integration is used to calculate the multiple similarities of each view data,including sample-sample,feature-feature and sample-feature.The similarity between each view is used to construct a hybrid graph for each view,and then the graphs weighted fusion into a comprehensive graph,and finally the clustering result is obtained by spectral clustering.The effectiveness of the proposed algorithm is demonstrated through experiments on several public benchmark multi-view data.(3)In order to efficiently process large-scale data,a deep embedding subspace clustering network(SE-DESC)is proposed.The algorithm consists of two parts.The first part is the self-expression network,which is used to solve the self-expression coefficient between data points,reduce the parameter amount of the self-expression model to a constant,and avoid high computational complexity.The second part is an embedding network with orthogonal layers,which is used to replace the spectral clustering algorithm to map the original data into the spectral space,avoiding the high computational complexity of spectral clustering.Finally,a set of soft assignment is obtained through the softmax layer,and the target distribution is introduced to improve the performance of the algorithm by minimizing the cross-entropy loss between the soft assignment and the target distribution.The effectiveness of the proposed algorithm is proved by experiments on mainstream large-scale public datasets including single-view and multi-view datasets. |