Font Size: a A A

Research And Implementation Of Backtracking-based Distributed Subgraph Enumeration On Large-scale Data Graphs

Posted on:2021-03-08Degree:MasterType:Thesis
Country:ChinaCandidate:W W HuFull Text:PDF
GTID:2428330647951044Subject:Computer Science and Technology
Abstract/Summary:PDF Full Text Request
Given a large data graph and a small pattern graph,the task of subgraph enumeration is to find all the subgraphs of the data graph that are isomorphic to the pattern graph.Subgraph enumeration is a fundamental query problem in many graph analysis applications.The distributed subgraph enumeration algorithms on large-scale data graphs have high communication and computation costs,finding efficient subgraph enumeration methods on large-scale data graphs is vital for academic and actual applications.The existing distributed subgraph enumeration algorithms can be divided into two groups: DFS-style and BFS-style.The DFS-style algorithms blindly replicate the data graph to each machine.When the size of the data graph increases or the pattern graph becomes complicated,the number of replicated edges grows dramatically,causing the DFS-style algorithms difficult to scale to large-scale data graphs or complex pattern graphs.The BFS-style algorithms turn subgraph enumeration into a distributed multiway join problem.However,they are inefficient as they have to shuffle partial matching results during the join.The number of these partial matching results may be much larger than the size of the data graph itself.Some BFS-style algorithms also require non-trivial costs on constructing and maintaining extra indexes.To overcome the drawbacks of the existing algorithms,the main contribution of this thesis can be summarized as follows:(1)This thesis proposes a new backtracking-based framework BENU to solve distributed subgraph enumeration on large-scale static data graphs.BENU divides a subgraph enumeration task into a group of local search tasks that can be executed in parallel.Each local search task takes a data graph vertex as the start vertex and follows a backtracking-based execution plan to enumerate all matches of the pattern graph in the data graph.The data graph is stored as adjacency sets in a distributed database and is queried as needed based on the on-demand shuffle technique.BENU only queries necessary edges of the data graph during the enumeration and avoids shuffling partial matching results.Besides,BENU does not need to construct any extra index.(Chapter 3)(2)This thesis proposes a search-based best execution plan generation method.The method includes a series of execution plan optimization techniques for reducing the execution cost of an execution plan,a cost estimation model for finding the best execution plan with the least execution cost and two pruning techniques for reducing the search space of the method.(Chapter 3)(3)This thesis develops the local database cache technique and task splitting technique for BENU.The local database cache technique sets up an in-memory cache to store the adjacency sets fetched from the distributed database,taking advantage of the inter-task and intra-task locality and significantly reducing the communication cost.The task splitting technique splits a local search task into a series of subtasks,effectively solving the skewed workloads of local search tasks.(Chapter 3)(4)This thesis further extends BENU to dynamic data graphs and proposes the Streaming-BENU framework to solve the continuous subgraph enumeration problem.Streaming-BENU finds out incremental matching results directly from the edge update stream of the data graph by enumerating isomorphic subgraphs of incremental pattern graphs.Streaming-BENU only stores the dynamic data graph and does not need to store any partial matching result during the enumeration.(Chapter 4)(5)This thesis evaluates the performance of BENU and Streaming-BENU on large-scale real-world data graphs.The experimental results show that the performance of BENU and Streaming-BENU outperform all the state-of-the-art distributed(continuous)subgraph enumeration methods,especially on complex pattern graphs.(Chapter 5)...
Keywords/Search Tags:Subgraph isomorphism, Subgraph matching, Continuous subgraph matching, Task parallel, Backtracking-based search
PDF Full Text Request
Related items