Font Size: a A A

Research On Multidimensional Data And Software Clustering Based On Graph Clustering

Posted on:2014-10-23Degree:MasterType:Thesis
Country:ChinaCandidate:X M XuFull Text:PDF
GTID:2208330434972853Subject:Computer software and theory
Abstract/Summary:PDF Full Text Request
As an unsupervised pattern classification method, clustering has broad application prospects invarious research areas such as speech recognition, character recognition, data mining,spatial-temporal database applications (GIS) and sequence data analysis.Research on data clustering could be divided into two main directions: the multidimensional space vector clustering and graph clustering (also known as community detection) according to the way of data modeling.With the continuous growth of various types of data need to be clustered, how to conduct cluster analysis on these massive data efficiently and effectively becomesa huge challenge for research on space vector clustering. For graph clustering, clustering process is highly associated with particular application background and the diversities occurred in cluster visualization process which makes it impossible to find out a clustering method which has universality for graph clustering on various application background.In this paper, we focus onclustering a class of complex data-multi-dimensional numeric vector data and software module graph.The work contains the following two parts. First, to ease the challenges of dealing with massive data which the multidimensional space clustering is facing, in this paper, we proposed the KBAC algorithm based on K-Means clustering. The algorithm uses the K-Means as a pre-clustering process which can auto-determine the optimal number of cluster. The key idea is to reduce multi-dimensional space clustering problem to community detection on graph. We further implemented thealgorithm based on cloud platform and proposed R-tree based optimization.Experimental analysis shows that if implemented in cloud framework,KBAC algorithm is capable of clustering large-scale data efficiently and effectively.On the other hand, because graph clustering’s diversity in different application background, in this paper we explore this area of graph clustering in the application of software clustering.We proposed Entry-based and PageRank-based two-stage hierarchical clustering algorithm and also naming algorithm for modules in clustering result. Moreover, we explored the dynamicgranularity-adjustable visualization of software clustering results. And based on the proposed algorithmsand designs, we developednovel software clustering tool prototype called SCuV. The tool could extractfunction call graph from the software’s source code and perform the proposed clustering algorithms. Case study showed that the proposed clustering approach is efficient and the tool could provide a hierarchy of the software module in a way which is more comprehension-friendly, which has good application prospect.
Keywords/Search Tags:K-means, the MapReduce, Societies found, software clustering
PDF Full Text Request
Related items