Font Size: a A A

Subspace Clustering Algorithm Based On Feature Selection And Sparse Representation

Posted on:2020-10-24Degree:MasterType:Thesis
Country:ChinaCandidate:C LuFull Text:PDF
GTID:2428330599976456Subject:Computer Science and Technology
Abstract/Summary:PDF Full Text Request
With the continuous development of information acquisition and storage technology,data sets generated in the actual application are not only getting larger and larger in sample size,but also dramatically increasing in feature dimension,resulting in more complex and diversified structures and attributes of data sets.In the modern research and application,e.g.,computer vision,image processing,multimedia and pattern recognition,it is necessary to process and understand large scale high-dimensional data.However,for high-dimensional data,clustering analysis is still a challenging problem in data mining.Often,such high-dimensional data lie close to low-dimensional structures corresponding to several classes or categories to which the data belong.Subspace representation has been extensively applied in clustering the high-dimensional data and revealing their embedded structure.Representation based methods with affinities are the most popular subspace clustering approach in recent years,which result in state-of-the-art performance.However,the performance of these methods degrades sharply due to the impact of redundant high-dimensional features.Therefore,it is of great research value and application prospect to design a series of subspace clustering algorithms with good performance,high efficiency and adaptability to specific types of data.In this paper,unsupervised feature selection and representation-based subspace clustering algorithm are studied.In summary,the contributions of this work include:(1)To remedy the issue of non-correlated feature interference learning and invalid calculation in high-dimensional feature data,the unsupervised feature selection is integrated into the self-representation-based coefficient matrix learning framework.Meanwhile,a weight factor is adopted to measure different contributions of correlated features,which improves the accuracy of feature selection results and robustness of data representation mechanism.(2)According to the grouping effect thought,a similarity matrix measuring Euclidean distance of samples is introduced into the constraint term of representation coefficient to further preserve the spatial neighborhood characteristics of input data.A new Smooth Representation clustering method based on Feature Selection(SRFS)is proposed in this paper.An alternating direction method of multipliers(ADMM)based algorithm is derived to optimize the proposed cost function,which guarantees the analytical solution of variable and the global convergence of objective function.Experiments are conducted on synthetic data and standard databases to demonstrate that SRFS outperforms the state-of-the-art approaches both in accuracy and efficiency.(3)In order to solve the problem of temporal data clustering in real life effectively,a temporal Laplacian regularization is adopted,which encodes the sequential relationships in time series data.To capture more detailed temporal closeness,we design a smoother data representation model which calculates the temporal similarity graph by an exponential variance measurement.(4)Unfortunately,the coefficient matrix by SSC and LRR is not obey the block diagonal property.A block diagonal constraint is applied to the affinity matrix,which is effective in improving the clustering performance by spectral clustering.A relaxed soft block diagonal regularization is also more flexible than the hard one in the optimization procedure.Combined with the block diagonal constraint and temporal data representation,a Temporal data Clustering algorithm based on Block Diagonal Regularization(TCBDR)is proposed.We derive an efficient algorithm to optimize the proposed problem,and show the theoretical analysis of convergence and complexity.Experiments on temporal data and common data prove that the algorithm has excellent clustering accuracy and running efficiency.
Keywords/Search Tags:Subspace clustering, feature selection, sparse representation, temporal data clustering, block diagonal regularization
PDF Full Text Request
Related items