Spectral Clustering In Distributed Systems

Posted on:2014-02-04

Degree:Master

Type:Thesis

Country:China

Candidate:C Meng

Full Text:PDF

GTID:2248330398972135

Subject:Detection Technology and Automation

Abstract/Summary:

PDF Full Text Request

Cluster analysis is a basic understanding activity in human life and is a classic problem in machine learning. The clustering algorithms gather different things into a class according to certain attribute of things, so that the similarity between the classes as small as possible, within the class as great as possible. K-means clustering algorithm is a classical and popular algorithm as a center-based clustering algorithm. When the data set distributed for the convex spherical, K-means algorithm has good performance and be able to get a better clustering results. However, when the sample space is convex, K-means algorithm often fails, and the algorithm using iterative optimization method for solving the optimal solution, so the algorithm will fall into the local optimal solution.Recent years, An unsupervised clustering algorithm that spectral clustering algorithm emerge for clustering in the sample space of any shape, overcome the K-means algorithm emerging into a local optimum the shortcomings of the solution, and can converge to the global optimal solution. The algorithm has the ability to identify the sample space of arbitrary shape, and will not fall into the local optimal solution. The application in practical problems is well for the algorithm. However, spectral clustring suffers from a problem in both memory use and computational time when the size of data set is large. To perform clustering on large data sets, We can transplanted the algorithm in a distributed environment, at the same time we use different sparse matrix methods to reduce the use of memory space.This article focuses on how to achieve efficient spectral clustering algorithm based on a distributed environment. Specific content including:1. Thinning similarity matrix using two different methods and compare two different sparse methods. The two methods are t nearest neighbor method and Nystrom method, in order to select a more excellent method; the two methods were compared from different angles. Finally, we found that the use of the t nearest neighbor method can get better clustering results verified by experiments.2. By the t nearest neighbor to the similarity matrix sparse, we can use the MPI model and Google’s MapReduce model to build a distributed environment. After that we transplant spectral clustering algorithm to distributed platforms. The results show that the algorithm is able to fully utilize the resources of each node to improve the operating efficiency of the algorithm and has good scalability. It provides a good solution for the application of massive data.

Keywords/Search Tags:

spectral clustering, distributed computing, Nystromalgorithm

PDF Full Text Request

Related items

1	Parallel Clustering Algorithm Based On MapReduce
2	The Research Of Distributed Parallel Spectral Clustering Algorithm Based On Data Stream
3	Research And Application Of Spectral Clustering
4	Research On Distributed Clustering Algorithm Based On Cloud Computing
5	Research On Spectral Clustering Algorithm And Its Application
6	Parallel Spectral Clustering Algorithm Based On Hadoop
7	The Research And Application Of Spectral Clustering Algorithm Based On Neighbor Similarity Graph
8	A Spectral Clustering Algorithm Based On Membrane Computing Framework And Its Application In Image Segmentation
9	Design Of Distributed Computing Task Clustering Scheduling Algorithm In Heterogeneous Cluster
10	Research Of Underwater Acoustic Image Segmentation Based On Spectral Clustering Technique