Research On Multidimensional Data And Software Clustering Based On Graph Clustering

Posted on:2014-10-23

Degree:Master

Type:Thesis

Country:China

Candidate:X M Xu

Full Text:PDF

GTID:2208330434972853

Subject:Computer software and theory

Abstract/Summary:

As an unsupervised pattern classification method, clustering has broad application prospects invarious research areas such as speech recognition, character recognition, data mining,spatial-temporal database applications (GIS) and sequence data analysis.Research on data clustering could be divided into two main directions: the multidimensional space vector clustering and graph clustering (also known as community detection) according to the way of data modeling.With the continuous growth of various types of data need to be clustered, how to conduct cluster analysis on these massive data efficiently and effectively becomesa huge challenge for research on space vector clustering. For graph clustering, clustering process is highly associated with particular application background and the diversities occurred in cluster visualization process which makes it impossible to find out a clustering method which has universality for graph clustering on various application background.In this paper, we focus onclustering a class of complex data-multi-dimensional numeric vector data and software module graph.The work contains the following two parts. First, to ease the challenges of dealing with massive data which the multidimensional space clustering is facing, in this paper, we proposed the KBAC algorithm based on K-Means clustering. The algorithm uses the K-Means as a pre-clustering process which can auto-determine the optimal number of cluster. The key idea is to reduce multi-dimensional space clustering problem to community detection on graph. We further implemented thealgorithm based on cloud platform and proposed R-tree based optimization.Experimental analysis shows that if implemented in cloud framework,KBAC algorithm is capable of clustering large-scale data efficiently and effectively.On the other hand, because graph clusteringâ€™s diversity in different application background, in this paper we explore this area of graph clustering in the application of software clustering.We proposed Entry-based and PageRank-based two-stage hierarchical clustering algorithm and also naming algorithm for modules in clustering result. Moreover, we explored the dynamicgranularity-adjustable visualization of software clustering results. And based on the proposed algorithmsand designs, we developednovel software clustering tool prototype called SCuV. The tool could extractfunction call graph from the softwareâ€™s source code and perform the proposed clustering algorithms. Case study showed that the proposed clustering approach is efficient and the tool could provide a hierarchy of the software module in a way which is more comprehension-friendly, which has good application prospect.

Keywords/Search Tags:

K-means, the MapReduce, Societies found, software clustering

Related items

1	Research On Parallelization Of K - Means Clustering Algorithm Based On MapReduce
2	Research On Community Found Algorithm Based On Parrallel K-Means Clustering
3	Parallel Clustering Algorithm Based On MapReduce
4	Improved K-means Clustering Algorithm Based On MapReduce Framework
5	Research Of K-means Clustering Algorithm Based On MapReduce
6	Research On Accelerating Of K-means Clustering Algorithm Using FPGA Based On MapReduce
7	Research On Mapreduce Based Big Data K-means Clustering Algorithm
8	Research On Improved K-means Clustering Algorithm And Its Application
9	Research On Parallelization Of Clustering Algorithm Based On MapReduce
10	Research On Parallelization Of Clustering Algorithm Based On Mapreduce