Research On Efficient Graph Clustering Algorithm Based On Connections

Posted on:2013-11-23

Degree:Master

Type:Thesis

Country:China

Candidate:H C Wang

Full Text:PDF

GTID:2248330395451895

Subject:Computer software and theory

Abstract/Summary:

PDF Full Text Request

In recent years, with the rapid development of applications such as socialnetworks, communication graphs, biological networks, and more, large amounts ofgraph data are generated each day from these applications. Graph clustering is apowerful technology of graph data analysis. However, for large-scale graph data,most current graph clustering algorithms scale poorly in terms of time or memory.The main reason is that current algorithms find all clusters in the entire graph. In fact,many clustering applications need only the subset of best clusters, and not all clustersin the entire graph. In view of this situation, Macropol and Singh proposed a newtechnique named TopGC (Top Graph Clusters) in2010. TopGC finds the best wellconnected, clique-like clusters within large graphs probabilistically. It is inherentlyparallelizable, and runs in linear time on the graph size. However, TopGC also has itsown disadvantage. Firstly, TopGC cluster the graph starting from the seed nodesbased on single-layer neighborhood, which finds over-fine-grained clusters withinlarge graph. Secondly, for massive graphs, the time and space efficiency of TopGCare expected to be further improved. Therefore, an efficient distributed clusteringalgorithm based on k-layers neighbors that named ITGC (Improved Top GraphClusters) is proposed in this paper. The main contents of this paper are as follows:1) Put forward the concept called k-layers neighbors. The algorithm searchesfor the required number of best clusters based on the similarity of k-layers neighborsand the edge weights among nodes, whose main idea is to avoid over-fine-grainedclusters caused by single-layer neighborhood.2) For several massive graphs, TopGC can not meet the needs of the practicalapplications in terms of time or memory without using the distributed parallelprocessing technology. A distributed clustering algorithm based on cut set isproposed. It adopt a searching strategy of minimum cost cut set based on graphconnectivity, to reduce the association of graph fragments and improve parallelismand scalablity of clustering.3) Note that it is also possible for overlapping clusters to be found. Afterfinding all clusters, a post-processing step is run here. If two clusters overlap morethan a given parameter, the cluster with lower score defined for a cluster is removed. 4) Design an experiment to verify our thought by Java. With the extensiveexperiments on real data, it shows that the proposed algorithm achieves the betterefficiency of time and space and the effective of clusters.

Keywords/Search Tags:

Graph clustering, K-layers neighbors, Distributed clustering

PDF Full Text Request

Related items

1	Research On Clustering Algorithm For Clusters With Irregular Structure
2	Research Of Clustering Algorithm For Complex Data
3	Clustering Technique With Shared Nearest Neighbors
4	Research On Chameleon Clustering Algorithm Based On Nearest Neighbor
5	Research And Implementation Of Mapreduce-based Graph Clustering Algorithm
6	Research On Improved Density Peak Clustering Methods Based On K-nearest Neighbors
7	Research For Clustering Algorithm For Keywords Based On Coloring Spread
8	The Research And Application Of Spectral Clustering Algorithm Based On Neighbor Similarity Graph
9	Distributed Clustering Of Graph Data And Application To Data Mining For E-commerce
10	Research On Constrained Spectral Clustering Algorithms