Font Size: a A A

Research On Multi-manifold Embedded Subspace Clustering Method

Posted on:2020-09-06Degree:MasterType:Thesis
Country:ChinaCandidate:D S YeFull Text:PDF
GTID:2428330575970795Subject:Applied Mathematics
Abstract/Summary:PDF Full Text Request
IIn the era of big data,the processing and representation of high-dimensional data is an important research content in the field of data science.Extracting the effective features from the intrinsic distribution of high-dimensional data is the basic data analysis method to carry out the data mining tasks.However,the traditional data mining methods for processing low-dimensional data have much more errors,while modeling high-dimensional data,due to the sparsity of the data and the similarity between the data points under the European metric.Therefore,complex models are often used to approximate high-dimensional data sets to extract more accurate intrinsic information,and then the relationship between data points is constructed based on this information,and the final data mining goal is realized by processing and reconstructing the relationship.Sparse subspace representation is a common method to obtain intrinsic subspace information in high-dimensional datasets via approaching the subspace representation of high-dimensional datasets in the framework of compressed sensing.Based on the sparse representation matrix of the data,the undirected graph which can represent the structure of the data set is finally applied to cluster the high-dimensional data via the spectral segmentation.Subspace techniques have gained much attention for their remarkable efficiency in representing high-dimensional data,in which sparse subspace clustering(SSC)and low-rank representation(LRR)are two commonly used prototypes in the fields of pattern recognition,computer vision and signal processing.Both of them aim at constructing a block matrix via a linear representation of data to make them be embedded into a linear subspace.However,few datasets satisfy the linear subspace assumption in the real world,which would generate a large number of sample misclassifications due to the representation error of the data set.Therefore,this paper constructed the subspace clustering model on the multi-manifold under the framework of sparse representation.The main work is reflected in the following two highlights.Firstly,a locally linear neighborhood graph is introduced to characterize the locally manifold structure of a dataset.In the meantime,a globally low-rank representation with the Frobenius norm minimization is constructed under the constraint of locally manifold embedding and a novel low-rank locally embedding representation model is proposed.With this model,the clusters of a dataset are considered as sub-manifolds and embedded in a low-dimensional manifold subspace.The low-rank representation has been synchronously realized in the low-dimensional embedding space.The local as well as global manifold structures of the dataset are clarified.Extensive experiments on synthetic datasets and real-world datasets demonstrate superior performance of the proposed method on subspace clustering compared with the state-of-the-art global subspace clustering approaches.Moreover,the parameters in the presented LRLER are analyzed through experiments to recommend an empirically parameter selecting strategy.Secondly,this paper also researched the neighborhood confounding problem,in the presented low-rank local embedding representation method,caused by the manifold overlap of multi-manifold while modeling the manifold's information.This problem is mainly due to the lack of the structural information between the submanifolds while modeling data.The neighbor sets of the data to construct the local information is merely selected according to the distance between the data points.Neighborhood hybridization usually causes the interference from the heterogeneous neighbors while modeling local relationship,which in turn affects the representation of the data set.Combining the characteristics of the local tangent space,this paper uses the angles between two local tangent spaces built by local PCA to describe the class information of the neighboring points,and constructs the local optimization problem to make sure that the weights between the points and the homogeneous neighbors are larger than the weights between heterogeneous neighbors.Then the mathematical derivation and proof are given to obtained the numerical solution of the optimization problem,in which the generalized cosine metric is used to describe the angle of the local tangent spaces.Finally,the correction weight is added into the low-rank local embedded representation model to construct the neighbor-adjusting low-rank local embedded representation(NA-LRLER)method.A large number of comparative experiments were carried out on the artificial datasets and the real datasets.And the variations of the coefficients were analyzed at the hybrid neighborhoods.The clustering accuracy and the parameters analysis of the corrected algorithm were obtained in this paper.
Keywords/Search Tags:Multi-manifold learning, Subspace clustering, Low-rank representation, Local linear embedding, Local tangent space analysis
PDF Full Text Request
Related items