| With the richness of data collection methods and the exponential growth of data scale,how to effectively exploit massive high-dimensional data has become an urgent problem to be solved.In many practical applications,high-dimensional data can be well approximated by a union of low-dimensional subspaces where each subspace corresponds to a class or a category.The problem of segmenting a set of data points according to the subspaces they belong to is called subspace clustering,which is widely used in the fields of motion segmentation,image clustering and hybrid system recognition.However,the complex distribution of real data,the size of data set,the density of single subspace data points,the influence of noise and outliers make it a challenging problem.Among them,sparse subspace clustering has become extremely popular due to its theoretical guarantee and empirical success.But existing related research work has problems with scalability and connectivity,resulting in poor clustering accuracy and low computational efficiency.Aiming at these problems above,the thesis has carried out the following three innovative work:1.By introducing Dropout into the self-representation model to randomly dropping out data points,the thesis proposes a Stochastic Sparse Subspace Clustering model.By simplifying the optimization problem,the thesis designs a scalable algorithm based on Orthogonal Matching Pursuit to solve a group of small-scale problems independently and then integrate the optimal solution.Experiments verify that the proposed algorithm can improve the scalability,and at the same time,the clustering accuracy can be improved due to the improved connectivity.2.The stochastic sparse subspace clustering is redefined as a consensus optimization problem,and the thesis introduce penalty function to solve it and derive Consensus Orthogonal Matching Pursuit algorithm.In the consensus orthogonal matching plursuit algorithm,the thesis derives the Dumped Orthogonal Matching Pursuit algorithm to quickly solve a set of small-scale sparse problems.The thesis conducts extensive experiments including hyperparameter analysis,connectivity analysis,and convergence analysis on synthetic data sets and real-world data sets.The experimental results verify the scalability and effectiveness of the algorithm.3.This thesis gives an asymptotic analysis of the stochastic selfrepresentation model,and theoretically proves that introducing Dropout into the self-representation model is equivalent to applying square l2 norm regularization to the self-representation coefficient.This theoretical result shows that the introduction of Dropout helps to obtain a denser optimal solution,thereby alleviating the connectivity problem.Then,the thesis proposes a new elastic network subspace clustering model,develop an efficient greedy algorithm,and give experimental verification. |