Font Size: a A A

GANC: Greedy Agglomerative Normalized Cut

Posted on:2012-07-17Degree:M.EngType:Thesis
University:McGill University (Canada)Candidate:Tabatabaei, Seyed SalimFull Text:PDF
GTID:2458390008495797Subject:Engineering
Abstract/Summary:
Graph clustering is a very common problem that arise in various fields; e.g., social science, computer networks, bioinformatics, marketing, ecological networks, and political science. In this thesis, several classes of graph clustering algorithms and ideas are reviewed. Furthermore a novel graph clustering algorithm is presented. The algorithm strives to optimize the normalized cut metric, which has proven to be a meaningful assessment of the quality of a clustering, since it takes into account both the similarity of nodes within the same cluster and the dissimilarities of nodes within different clusters. The clustering algorithm we present, which we call "Greedy Agglomerative Normalized Cut", is fast and scalable to graphs that contain millions of nodes and edges. It consists of three components: a greedy agglomerative hierarchical clustering procedure, a model order selection step (identifying the number of clusters), and a local refinement step. For most of the real-life graphs, the overall complexity of the algorithm is O(n log 2 n), where n is the number of nodes in the graph. The worse case complexity is O(n log 2 n) for very unfavorable graphs that do not usually arise in practical situations. Synthetic and real graphs are used to evaluate the performance with respect to the minimization of normalized cut, the model order selection, and the run-time of the proposed algorithm. Performance analysis indicates that the algorithm proposed in this thesis identifies partitionings that have comparable values of the normalized cut metric to those of the partitionings identified by the much more computationally intensive spectral methods.
Keywords/Search Tags:Normalized cut, Greedy agglomerative, Clustering
Related items