Font Size: a A A

Attributed Graph Clustering Using Approximate Effective Resistance

Posted on:2019-09-08Degree:MasterType:Thesis
Country:ChinaCandidate:Kwabena A.Antwi-BoasiakoFull Text:PDF
GTID:2428330545465712Subject:COMPUTER TECHNOLOGY
Abstract/Summary:PDF Full Text Request
Clustering is an important tool for understanding data.In clustering,we aim to sepa-rate data into groups so that points in the same group are more similar to one another than they are to points outside the group.When dealing with graph or network data,there are relationships(represented by links or edges)between points that explicitly characterize similarity or dissimilarity.However,there may be other information about the nodes from which we can deduce relationships that are not explicitly stated or realized.For example,judging from the fields in which a group of authors publish and the content of their publications,we may infer a similarity or relationship even when that similarity has not yet resulted in them publishing together.When the nodes of a graph have extra information from which we can infer relation-ships between them,we refer to the graph as an attributed graph.There has been increased attention on this paradigm of clustering in recent times because of the increased availabil-ity of such data.Some of the issues that arise when we attempt to perform clustering in this setting relate to how to effectively combine the views,how to manage large volumes of data,and how to infer the relative importance of different views and attributes.In this thesis,we address all of these problems in part.We propose some methods for converting attribute data into similarities which can then be combined with the structural information in a graph.We compare these methods with cosine similarity which is a widely used method for this task and discuss the relative strengths and weaknesses of the methods.We also extend an approximate spectral clustering algorithm from the single-view clustering problem to the attributed graph setting.This algorithm has the advantage of being able to handle larger graphs than traditional spectral algorithms that do not sample the data to reduce its size nor approximate eigenvectors.It first embeds the graph into a space where the distance between points is the effective resistance between them when the graph is considered as an electrical resistor network.Using an exemplar from a class of fast solvers for symmetric diagonally dominant matrices and a method proposed by Spielman and Srivastava[28],Khoa and Chawla[17]show how the single-view clustering problem can be solved approximately.By extending their methods,we show through experiments on real-world datasets that the proposed methods are effective for addressing the attributed graph clustering problem.We also propose a k-medoids-based algorithm that uses a distance which integrates the different distances on the structural and attribute views of the data into one distance measure which is then used for clustering.The experimental results show that this method is effective for clustering small datasets with hundreds of nodes.
Keywords/Search Tags:similarity metrics, attributed graph clustering, effective resistance
PDF Full Text Request
Related items