Font Size: a A A

Sampling Algorithms for Big Graph Analytic

Posted on:2019-12-03Degree:Ph.DType:Dissertation
University:Drexel UniversityCandidate:Han, GuyueFull Text:PDF
GTID:1478390017489649Subject:Electrical engineering
Abstract/Summary:
The analysis of large graphs offers new insights into social and other networks, and thus is of increasing interest to marketeers, sociologists, mathematicians and computer scientists. However, the extremely large size of most graphs of interest renders them difficult to analyze because of at least four challenges: lack of memory, restricted access to the full graph, prohibitive computational cost and real-time changes in the graph. This dissertation presents graph sampling as a powerful and attractive approach to meet the above challenges, whereby properties of the full graph are estimated based on an examination of only a small portion of the graph.;In this dissertation, we focus on two graph sampling strategies: edge-based sampling and traversal-based sampling. For edge-based sampling, we propose an edge-based sampling framework for big-graph analytics in dynamic graphs. It enhances the traditional model by enabling the use of additional related information. To demonstrate the advantages of our proposed framework, we present a new sampling algorithm which provides an unbiased estimate of the total number of triangles in a fully dynamic graph where both edge additions and deletions are considered. Our algorithm addresses three of the aforementioned challenges; it has low memory and computational costs, and can be applied to dynamic graphs. In particular, it offers a significantly improved performance in real time estimates compared to current state-of-the-art methods.;We also propose several traversal-based graph sampling algorithms for the estimation of a micro-structural property (motif statistics) and the estimation of a macro-structural property (the two largest eigenvalues of the graph). All of these algorithms solve the challenges of prohibitive computational and storage costs, and restricted access. For micro-structural property, we develop a new sampling algorithm which estimates the concentration of mo- tifs of any size via random walk. Unlike previous approaches which enumerate subgraphs around the random walk to find motifs, our algorithm achieves its computational efficiency by using a randomized protocol to sample subgraphs in the neighborhood of the nodes visited by the walk. The experimental results show that our algorithm achieves better accuracy and higher precision than previously known algorithms. For macro-structural property, we propose a series of new sampling algorithms which estimate the top eigenvalues of a graph. Unlike previous methods which try to collect a subgraph with the most influential nodes, our algorithms achieve estimates of the two largest eigenvalues by estimating the number of closed walks of a certain length. The experimental results show that our algorithms are much faster and achieve higher accuracy on most graphs than previously known algorithms seeking to address the same challenges.
Keywords/Search Tags:Graph, Algorithms, Sampling, Challenges, New
Related items