Font Size: a A A

Research On Distributed Heterogeneous Graph Clustering Algorithms

Posted on:2020-05-07Degree:MasterType:Thesis
Country:ChinaCandidate:Y L ZhangFull Text:PDF
GTID:2428330572496573Subject:Computer Science and Technology
Abstract/Summary:PDF Full Text Request
Graph is one of the most common data structures in computer science to characterize the intrinsic relationships between multi-source heterogeneous data.Many datasets including social media data can be modeled as graphs.Clustering such graphs is able to provide useful insights into the structure of the data.To improve the quality of clustering,vertices attributes can be taken into account,resulting in attributed graphs.However,existing attributed graph clustering methods generally consider attribute similarity and structural similarity separately.Besides,the rapid growing volume and variety of data requires the graph clustering algorithm becoming more powerful in efficiency and scalability.Therefore,this thesis studies distributed heterogeneous graph clustering algorithms and uniformly measures vertex attribute similarity and graph structure similarity.This thesis represents attributed graphs as heterogeneous graphs,which enables the use of Personalized PageRank(PPR)as a unified distance measure that captures both structural and attribute similarity.To improve the efficiency of the vertices similarity measure,this thesis develops four parallel PPR approaches that aim to enable efficient PPR scores computation.To effectively clustering heterogeneous graph vertices,this thesis proposes a neighbor graph based parallel DBSCAN clustering method and improves it with a core vertices skeleton based optimization algorithm.To boost the effectiveness of the clustering,this thesis proposes an entropy based edge weight update strategy and updates edge weights iteratively to balance the importance of different attributes.Extensive experiments on real-life datasets offer insights into the effectiveness,efficiency and scalability of theses proposed proposals.
Keywords/Search Tags:Parallel computation, Heterogeneous graphs, Personalized PageRank, DBSCAN clustering, Edge weight update
PDF Full Text Request
Related items