Font Size: a A A

The Research And Application Of Spectral Clustering Algorithm Based On Neighbor Similarity Graph

Posted on:2020-03-14Degree:MasterType:Thesis
Country:ChinaCandidate:Y C LiuFull Text:PDF
GTID:2428330578464284Subject:Computer Science and Technology
Abstract/Summary:PDF Full Text Request
The spectral clustering is a clustering algorithm based on the theory of spectral partitioning,and it is a popular method due to its superior performance in the data sets with non-convex clusters.The similarity graph in spectral clustering usually directly affects the similarity between the data points.And it will further affect the clustering performance of spectral clustering algorithms.To address this issue,we propose two different spectral clustering algorithms based on neighbor similarity graph,and one of them is a semi-supervised algorithm.Finally we combine the two algorithms.And then a new algorithm is obtained,next we apply it to text clustering.The main work of this paper is as follows:First,we propose a new algorithm: semi-supervised spectral clustering based on density adaptive neighbor similarity graph(DAN-SSC).The proposed algorithm first spreads pairwise constrained prior information to the entire space of clustering,and then it reasonably takes advantages of this information to guide the construction process of affinity matrix in DAN algorithm.Next the proposed algorithm will continue to complete the process of normalized spectral clustering.At last,it is necessary to check whether the results of previous clustering satisfy the prior information,and if they are not satisfied,we need to modify them.The results of experiments show that the proposed algorithm makes full use of the labeled data and it avoids the situation that a small amount of labeled data may mislead the clustering process.The performance of DAN-SSC algorithm is better than the performance of unsupervised spectral clustering algorithms in experimental data.Second,we propose the other algorithm: spectral clustering based on natural nearest neighbor similarity graph(NSG-SC).Natural nearest neighbor is a relatively new concept of nearest neighbor,which has many advantages.We innovatively let it guide the construction process of similarity graph in spectral clustering algorithms.The results of experiments show that compared with traditional neighbor relation graphs,natural nearest neighbor similarity graph can more accurately reflect the similarity relationship between samples.For traditional spectral clustering algorithms,it is difficult to divide the data sets with arbitrary shaped clusters and varying density.NSG-SC algorithm also improves these disadvantages.Finally,the two proposed algorithms in previous are merged,and then semi-supervised spectral clustering based on natural nearest neighbor similarity graph(NSG-SSC)is obtained.Next we parallelize NSG-SSC algorithm,then apply it to text clustering in real data sets.Text data sets usually have high dimensions and complex distribution,but spectral clustering algorithms are just good at dealing with such data sets.And parallelizing the algorithm can greatly speed up it.The parallelization platform of NSG-SSC algorithm is the second generation distributed platform named Spark.It based on a programming thinking named MapReduce and can greatly improve the speed of large data processing through the memory computing power.The results of experiments show that NSG-SSC algorithm can be successfully applied to text clustering after parallelization,and it has achieved good results.
Keywords/Search Tags:Spectral Clustering, Semi-Supervised Learning, Natural Nearest Neighbor, Similarity Graph, Text clustering, Distributed System, Spark
PDF Full Text Request
Related items