Font Size: a A A

Research On Efficient Graph Clustering Algorithm Based On Connections

Posted on:2013-11-23Degree:MasterType:Thesis
Country:ChinaCandidate:H C WangFull Text:PDF
GTID:2248330395451895Subject:Computer software and theory
Abstract/Summary:PDF Full Text Request
In recent years, with the rapid development of applications such as socialnetworks, communication graphs, biological networks, and more, large amounts ofgraph data are generated each day from these applications. Graph clustering is apowerful technology of graph data analysis. However, for large-scale graph data,most current graph clustering algorithms scale poorly in terms of time or memory.The main reason is that current algorithms find all clusters in the entire graph. In fact,many clustering applications need only the subset of best clusters, and not all clustersin the entire graph. In view of this situation, Macropol and Singh proposed a newtechnique named TopGC (Top Graph Clusters) in2010. TopGC finds the best wellconnected, clique-like clusters within large graphs probabilistically. It is inherentlyparallelizable, and runs in linear time on the graph size. However, TopGC also has itsown disadvantage. Firstly, TopGC cluster the graph starting from the seed nodesbased on single-layer neighborhood, which finds over-fine-grained clusters withinlarge graph. Secondly, for massive graphs, the time and space efficiency of TopGCare expected to be further improved. Therefore, an efficient distributed clusteringalgorithm based on k-layers neighbors that named ITGC (Improved Top GraphClusters) is proposed in this paper. The main contents of this paper are as follows:1) Put forward the concept called k-layers neighbors. The algorithm searchesfor the required number of best clusters based on the similarity of k-layers neighborsand the edge weights among nodes, whose main idea is to avoid over-fine-grainedclusters caused by single-layer neighborhood.2) For several massive graphs, TopGC can not meet the needs of the practicalapplications in terms of time or memory without using the distributed parallelprocessing technology. A distributed clustering algorithm based on cut set isproposed. It adopt a searching strategy of minimum cost cut set based on graphconnectivity, to reduce the association of graph fragments and improve parallelismand scalablity of clustering.3) Note that it is also possible for overlapping clusters to be found. Afterfinding all clusters, a post-processing step is run here. If two clusters overlap morethan a given parameter, the cluster with lower score defined for a cluster is removed. 4) Design an experiment to verify our thought by Java. With the extensiveexperiments on real data, it shows that the proposed algorithm achieves the betterefficiency of time and space and the effective of clusters.
Keywords/Search Tags:Graph clustering, K-layers neighbors, Distributed clustering
PDF Full Text Request
Related items