Font Size: a A A

Research On Subspace Clustering Model And Algorithm

Posted on:2022-10-20Degree:DoctorType:Dissertation
Country:ChinaCandidate:X Q ZhangFull Text:PDF
GTID:1488306755459574Subject:Control Science and Engineering
Abstract/Summary:PDF Full Text Request
Cluster analysis is one of the key technologies in the field of data mining.Facing low-dimensional data,traditional clustering algorithms can achieve ideal results.With the continuous development of data acquisition technology,the dimensionality of data has increased sharply,and the traditional clustering algorithms are severely constrained by the bottleneck.Therefore,the design of more efficient and advanced clustering algorithms to meet the needs of high-dimensional data mining has become a research hot-spot.It is generally believed that high-dimensional data is embedded in a low-dimensional manifold.The purpose of subspace clustering(SC)is to divide the high-dimensional data from different subspaces into the low-dimensional subspace to which it belongs,which is an effective way to achieve high-dimensional data Clustering.In recent years,as a spectral clustering algorithm based on generalized sparse representation,sparse subspace clustering(SSC)has attracted wide attention due to its superior clustering performance,easy to handle and the effectiveness of calculation,etc.It has become a research hot-spot in subspace clustering.The core task of sparse subspace clustering is to reveal the true subspace structure of high-dimensional data by constructing a representation model,and to obtain the coefficient representation matrix under low-dimensional subspace by optimizing the model,and then construct the affinity matrix that is conducive to accurate clustering.Sparse subspace clustering has been successfully applied in image processing,pattern recognition and other fields,but there are still many problems and there is a lot of room for development.Based on the representation-based spectral clustering method framework and aiming at some problems existing in the existing models,this paper discussed and studied the ability to adapt to nonlinear data and suppress large-scale noise,effective implementation of the algorithm,model extension and application,etc.The main research results and contributions are as follows:(1)A robust low-rank kernel space clustering(RLKSC)method based on Schatten pnorm and correntropy is proposed.Considering that high-dimensional data may contain complex noise and nonlinear structure,in the framework of SSC,linear subspace clustering is extended to nonlinear subspace clustering by ”kernel trick”,and the optimization problem is solved effectively by Alternating Direction Method of Multipliers(ADMM).The rank of data in the feature space can be approximated effectively by Schatten p-norm regularization,and a closed form solution to the subproblem is given.Using correntropy to measure data large-scale pollution can effectively improve the robustness of the model.At the same time,according to the characteristics of half quadratic theory with closed solution in half quadratic optimization,an effective algorithm is proposed to solve the model.The sufficient experimental results on several standard datasets show that this method can improve the clustering performance remarkably.(2)A robust low-rank kernel multi-view subspace clustering(RLKMSC)method is proposed.Multi-view data can fully describe the target from different perspectives.It can be seen that multi-view subspace clustering is an effective clustering problem.The existing multi-view clustering methods usually solve the original problem by convex relaxation,but this kind of method often gets suboptimal solution.To solve this problem,we use the method mentioned in(1)to combine non-convex Schatten p-norm(0 < p ? 1)regularization with ”kernel trick”,and introduce correntropy into the model,so as to better deal with the problems of non-linear structure and non-Gaussian noise in multi-view data.In addition,we design two regularization methods to learn the joint subspace representation of all views.The model can be solved effectively by a specially designed iterative algorithm,which can ensure that each iteration has a closed form solution and greatly simplifies the optimization problem.Experimental results on five real datasets show that the algorithm has good clustering effect and robustness.(3)A robust multi-view subspace clustering method based on heterogeneous kernel(multi-kernel)low-rank representation(MKLR-RMSC)is proposed.The model mainly completes the following four tasks: a.fully mining the complementary information provided by different views in the feature space;b.the data in the feature space contains multiple low dimensional subspaces;c.learning the representation of all views specific to the same centroid;d.effectively processing non-Gaussian noise in the data.In our model,we apply weighted Schatten p-norm to the model,which can balance the influence of different ranks while approaching the original low-rank hypothesis.In addition,different predefined kernel matrices are designed for different views,which is more conducive to mining the unique and complementary information of different views.At the same time,correntropy is applied to our model as a robust measure.Sufficient experimental results show that our method is more effective and robust.(4)A robust multi-view subspace clustering method based on confidence auto-weighting(CLWRMSC)is proposed.When learning the consistent representation of all views,each view may have a different confidence level.In addition,due to the non-linearity of data and noise pollution,different samples in the same view may have different confidence levels.However,most existing methods only assign a uniform weight to each view,ignoring the difference of sample confidence,and may only obtain suboptimal solutions.Therefore,we propose an adaptive sample confidence weighting strategy,which enables our model to focus on the confidence levels of both views and samples when learning the consistent representation of all views.On this basis,an adaptive low-rank multi-kernel learning(MKL)strategy is designed.At the same time,we perform block diagonal regularization(BDR)on the representation matrix learned from each view.A large number of experiments show that the model is an excellent clustering algorithm.
Keywords/Search Tags:Subspace clustering, low-rank kernel, multi-view data, collaborative rep-resentation, Schatten p-norm, correntropy, multi-kernel learning, confidence level
PDF Full Text Request
Related items