Font Size: a A A

Research On The Method Of Linkage-Based Graph Clustering

Posted on:2009-06-05Degree:MasterType:Thesis
Country:ChinaCandidate:C Y GuoFull Text:PDF
GTID:2178360272463554Subject:Computer software and theory
Abstract/Summary:PDF Full Text Request
Graph is a general mathematical method of modeling entities and their relationships and is commonly used to describe a wide variety of structural data including the Internet, the Web, social networks, metabolic networks, protein-interaction networks, networks of citations among papers, traffic networks, and many more. In the recent years, with the rapid development of communication, computer and network technologies and the continuous advance in social informationization, there has been an exponentially increasing tendency in the size of the graphs abstractly representing the real world network structures. Consequently, it makes the necessity and challenge of the research on the structural characteristics of the large complex networks become obvious on the one hand, and on the other hand, it has intrigued the enthusiastic interests of many researchers in attempting to investigate the effective clustering approaches to analyze, understand and visualize those sophisticated structures.The fundamental objective of graph clustering is to identify and classify graph nodes into groups with the sparse inter-connectivity and the dense intra-connectivity on the basis of the connectivity of the characteristics they possess. Unlike the data clustering, graph clustering has its own particularity and complexity due to the high sensitivity of many graph characteristics, such as the distance and connectivity between vertices, to the edges existing in the graph. Additionally, since graph distances are often integers, it is very common for vertices to be equivalent to several cluster medoids, which easily results in confliction and incorrect convergence. Therefore, it is impossible to accomplish the objective of graph clustering via a direct application of the existing data clustering approaches to graphs without the consideration of their structural features.The thesis focuses on the study of clustering the connected undirected unweighted graphs. Following the summarization of several typical graph clustering algorithms in section two, we present our research results in five parts:(1) The concepts of linkage between vertices and linkage matrix for a graph are introduced; their definitions and computations are formularized. The efficiency for obtaining the matrix depends primarily on the shortest path algorithm used in the computation.(2) A graph clustering algorithm based on the max-min linkage is proposed. It incorporates the fundamental idea of the max-min distance clustering method with the linkage matrix of a graph and has a timecomplexity of O((k+3)k/2 n) , where k is the expected number of clusters and nis the scale of the graph.(3) The concepts of dissimilarity between nodes and the corresponding dissimilarity matrix for a graph are clearly defined. The matrix can be constructed with a computational complexity of O(n~2).(4) A method of measuring the dissimilarities between the sub-clusters is presented on the premise of the dissimilarity matrix of the graph available. It can be used to generate agglomerative hierarchical clusters of graphs.(5) Examples for all algorithms above are provided and their correctness and effectiveness have been validated experimentally.
Keywords/Search Tags:Graph clustering, The Max-min Linkage Clustering, Agglomerative Hierarchical Clustering
PDF Full Text Request
Related items