Font Size: a A A

A Research On Subspace Clustering Based On Deep Learning

Posted on:2022-01-21Degree:MasterType:Thesis
Country:ChinaCandidate:J C LvFull Text:PDF
GTID:2518306524489294Subject:Master of Engineering
Abstract/Summary:PDF Full Text Request
With the popularization of communication technology and the Internet,the information that people can obtain is unimaginable before.Data has become a factor of production and plays an important role in all walks of life.Since most of the data is unlabeled,creating labels requires a lot of manpower.The clustering method divides the data into several clusters through the correlation of the intrinsic attributes of the data,which can effectively analyze the data to extract valuable information to guide production.Benefit from the proposal and development of deep neural networks,combining traditional clustering with deep learning to use its feature extraction capabilities has attracted more and more attention from researchers.Auto-Encoder(AE)based deep subspace clus-tering(DSC)methods have achieved impressive performance owning to the powerful representation extracted by the deep neural network while prioritizing categorical separability.However,self-reconstruction loss of AE ignores rich useful relation information and might lead to indiscriminative representation,which inevitably deteriorates the clustering performance.It is also challenging to learn high-level similarity without feeding semantic labels.Another unsolved problem facing DSC is the huge cost of memory due to n×n similarity matrix,i.e.,incurred by the self-expression layer between encoder and decoder.To handle these problems,we propose Pseudo-supervised Deep Subspace Clustering(PSSC),we use pairwise similarity to weight the reconstruction loss to capture the local structure information,while the similarity is learned by virtue of the self-expression layer.Pseudo-graph and pseudo-label,which allow us to take advantage of uncertain knowledge acquired during the training,are further employed to supervise similarity learning.Joint learning and iterative training facilitate to obtain an overall optimal solution.Through a large number of experiments,this paper compares related methods on multiple data sets,and the excellent performance of the proposed method in the clustering results proves its superiority.The huge memory cost caused by the self-expression layer makes the proposed model unable to directly apply large-scale data sets.In order to solve large-scale and out-of-sample problems,we further combined the model with the k-nearest neighbor algorithm.By clustering small batches of data,the k-nearest neighbor algorithm is used to extend the clustering results of small batches of data to the entire data set.Successfully applied the proposed model to large-scale data sets,and verified the effectiveness of the proposed method through experiments.
Keywords/Search Tags:Clustering, Deep Learning, Auto Encoder, Subspace Clustering, Pseudosupervision, Similarity Learning
PDF Full Text Request
Related items