Font Size: a A A

Research On Sparse Subspace Clustering And Its Fast Algorithms

Posted on:2020-12-06Degree:DoctorType:Dissertation
Country:ChinaCandidate:J WangFull Text:PDF
GTID:1368330590972936Subject:Artificial Intelligence and information processing
Abstract/Summary:PDF Full Text Request
Clustering analysis is an unsupervised learning task which is widely used in many fields.Clustering analysis can discover the global distribution patterns of data and the valuable interrelationships between attributes.With the advancement of information technology,various types of data dimensions can reach thousands or even higher,so clustering analysis for high-dimensional data is very important.Due to the existence of redundant dimensions and complex noise,distance-based similarity metrics often fail for high-dimensional data,and subspace methods are often utilized to analyze high-dimensional data.It is generally believed that high-dimensional data is embedded in low-dimensional manifolds.The purpose of subspace clustering is to divide high-dimensional data from different subspaces into the low-dimensional subspaces to which they belong.How to remove the influence of noise on data is also an important task in subspace clustering.In recent years,sparse subspace clustering has received extensive attention.Sparse subspace clustering is a data clustering framework based on generalized sparse repre-sentation and spectral clustering algorithms.Its core idea is to design a representation model that can reveal the real subspace structure of high-dimensional data,and obtain the coefficient representation matrix in low-dimensional subspace to construct a similar-ity matrix which contributes to precise clustering.Sparse subspace clustering has been widely studied and applied in the fields of computer vision,image processing and pattern recognition,but there are still many problems to be solved and also a lot of room for improvement.In this dissertation,we have carried out targeted research on the model and designed fast algorithm of generalized sparse subspace clustering.Specifically,the main work and significant innovations of this thesis are as follows:First,a framework of Low Rank Subspace sparse Representation(LRSR)is proposed.High-dimensional data often contains redundant dimensions and singular noise,so it is necessary to use subspace recovery and segmentation method to find the subspaces to which they belong.The method of low rank representation performs well on data with sparse noise,but it is unreasonable to directly use the real data containing noise as a dictionary.The dictionary variable is introduced in the LRSR model and solved together with the coefficient representation matrix,which has stronger fault tolerance ability.At the same time,under the assumption of disjoint subspaces,the LRSR model applies both low rank and sparse constraints to the coefficient representation matrix which satisfies the strong diagonal block structure,which lays the foundation for the subsequent subspace clustering task.In the meantime,corrupted samples in the dataset can be recovered elegantly.The LRSR model is solved under the framework of ADMM algorithm.We also give the convergence guarantee of the algorithm in this thesis.Experimental results on the face images(Extended YaleB)and the motion trajectories(Hopkins 155)datasets show that the LRSR algorithm is superior to the existing method in removing noise interference,recovering original data and improving clustering accuracy.Second,we propose a Global Local Affinity matrix Model(GLAM)based on global information of eigengap and local distance between samples.In the graph-based learning method,the construction of data graph(i.e.,the construction of similarity matrix)reveals the relationship between data.It is critical about constructing a similarity matrix which reflects similar attributes within a class and differences between classes.We utilize the eigengap as a global constraint to resist noise interference,in the meantime the stability of the feature space related to the cluster number is also maintained.Therefore,we propose to approximate the similarity matrix with ideally diagonal block structure from the perspective of maximizing the eigengap,and the local distance between data is utilized as a regular term to prevent the eigengap from being too large to fail the efficacy of similarity matrix.The combination of subspace partitioning method and similarity matrix constructed by GLAM can improve the accuracy of subspace clustering.At the same time,the similarity matrix constructed by GLAM method is successfully applied to graph-based semi-supervised classification task.Experimental results on the real public data set verify the effectiveness of the GLAM algorithm.Third,we propose a non-convex low rank representation via block coordinate descent method(NLRR++)method.In the subspace clustering algorithm which is based on nuclear norm optimization,each iteration involves singular value decomposition with high time complexity,making the subspace clustering algorithm not scalable to large-scale data sets.In the framework of matrix decomposition,the non-convex low-rank model is reconstructed by the rank-one matrix summation,and the variable is updated via the block coordinate descent method.On this basis,this paper also designs a random algorithm to update internal variables according to a certain probability distribution value,which can further accelerate convergence.The time complexity of the algorithm is reduced from 0(n3)to 0(mnd),and the space complexity is reduced from 0(n2)to 0(mn)(usually d?m?n).Moreover,only the simple "matrix-vector" multiplication is involved in each iteration,which enjoys the convenience of parallel computing of the algorithm.The NLRR++algorithm converges faster by carefully designing the updating sequences of the coordinates in the variable.At the same time,we give the convergence analysis of the NLRR++algorithm.Considering the non-scalability of spectral clustering algorithm on large data sets,a simpler clustering method is proposed by using the special form of non-convex models.Compared with the current advanced methods,the NLRR++algorithm is more accurate and efficient,and the solution speed is 12 times faster than the most advanced non-convex low rank method.
Keywords/Search Tags:Subspace Clustering, Matrix Factorization, Low Rank Representation, Spectral Clustering, Graph learning
PDF Full Text Request
Related items