Font Size: a A A

Research On Spectral Clustering Of Large Scale Complex Data

Posted on:2018-09-30Degree:DoctorType:Dissertation
Country:ChinaCandidate:H J JiaFull Text:PDF
GTID:1318330539975103Subject:Computer application technology
Abstract/Summary:PDF Full Text Request
Spectral clustering is based on graph theory.Given a data set,we may construct an undirected weighted graph,where the vertices represent data points,and each edge has a similarity weight.Clustering the data points into multiple classes is equivalent to dividing the graph into several subgraphs.Our goal is to maximize the connection weights within the subgraph,and minimize the connection weights between different subgraphs.Nowadays,data has become a valuable resource.As an effective data analysis tool,spectral clustering is able to find the intrinsic relationship between different objects.Due to the reliable theoretical basis and good clustering performance,spectral clustering has been successfully applied in many fields.Although spectral clustering has many advantages,it faces the challenges of high time and space complexity when dealing with large scale complex data.Spectral clustering is still in the development stage and there are many problems remained for further study.This paper focuses on the shortcomings of the existing spectral clustering algorithms and studies the corresponding improvement methods from the aspects of high dimensional data,complex structure data and large scale data.The specific research contents are as follows:1.Study the spectral clustering based on rough set attribute optimization.Traditional spectral clustering algorithms treat all the attributes of data set equally when processing high-dimensional data.They are vulnerable to be affected by the noise and irrelevant attributes.Therefore,we consider adding a weight parameter to each attribute and let different attributes play different roles in the clustering.The attribute importance is measured by the knowledge entropy of rough set.Then we propose an attribute weighted spectral clustering model based on knowledge entropy.Secondly,to solve the problem that the rough set can not deal with continuous data directly,we combine the neighborhood rough set and the information entropy together to analyze the attribute.And a spectral clustering model based on neighborhood rough set reduction is proposed.This allows us to make full use of the information contained in each attribute to filter out the attributes that contribute the most to the clustering.2.Study the spectral clustering based on local density.How to describe the affinity of data points reasonably is very important to spectral clustering.In order to objectively reflect the complex structure of data,a density-sensitive similarity measure is designed.This method can increase the similarities between data points in high density region and reduce the similarities between data points in low density region.Then we propose a density adaptive affinity propagation spectral clustering model.In addition,we further study the problem of spectral clustering when data distribution is unbalanced,and propose a self-tuning/p-spectral clustering model based on common neighbors.We add the neighborhood information of data points into the similarity function.The distances between data points are automatically adjusted according to the number of common neighbors.This method can keep the global consistency of the data.3.Study the spectral clustering based on Nystrom low rank approximation.The time complexity of computing the eigenvectors of Laplacian matrix is very high,which restricts the application of spectral clustering in large scale data environment.Nystrom method can simplify the eigenvalue problem in spectral clustering by sampling and approximate strategy.In order to accelerate the Nystrom approximation process,the randomized SVD method is used for the eigen-decomposition of the sampled sub-matrix,and a Nystrom spectral clustering model based on randomized SVD is proposed.To avoid the blindness of sample selection,an incremental sampling method is designed.It determines the appropriate samples according to the probability distribution of data points.Then we propose a Nystrom spectral clustering model based on probability incremental sampling.4.Study the spectral clustering based on approximate weighted kernel k-means.Since spectral clustering and weighted kernel k-means are mathematically equivalent,the objective functions of spectral clustering can be optimized by weighted kernel k-means,which will avoid the eigen-decomposition of Laplacian matrix.However,the space complexity of weighted kernel k-means algorithm is still O(n2).To reduce the space requirement of spectral clustering,an approximate weighted kernel k-means algorithm is proposed.It limits the cluster centers to a smaller subspace generated by the sampling points.Then,according to the semi-supervised framework of Hidden Markov Random Fields(HMRF),we propose a semi-supervised approximate spectral clustering algorithm based on HMRF model.And the approximate weighted kernel k-means is used to solve the semi-supervised spectral clustering problems.
Keywords/Search Tags:spectral clustering, similarity measure, Laplacian matrix, Nystrom approximation, approximate weighted kernel k-means
PDF Full Text Request
Related items