Font Size: a A A

Research On Distributed Subgraph Matching Algorithm For Large Scale Graph Data

Posted on:2020-05-08Degree:MasterType:Thesis
Country:ChinaCandidate:W XuFull Text:PDF
GTID:2428330572999304Subject:Engineering
Abstract/Summary:PDF Full Text Request
As a complex data structure,graphs are ideal for expressing intrinsically relevant and closely related data.Subgraph matching technology,as a basic operation that can efficiently query on graph data,also plays an important role in the field of data mining.Due to the explosive growth of graph data size makes it difficult to process subgraph matching on single machine.Although the existing distributed algorithms could solve the problem to some extent,the network communication cost between the distributed computing nodes still affects the performance of the algorithm.To solve this problem,a distributed subgraph matching algorithm named DSsearch was proposed,which includes four steps: spliting query graph,preprocessing data graph,filtering candidate and combining intermediate results.The query graph splitting stage splits the given directed query graph into a series of two-tier tree structures as query subgraphs.In the data graph preprocessing,the strategy of graph partitioning and perfect neighbor vertex was used to reduce the communication cost among the distributed computing nodes.The DSgraph storage structure is designed to store candidate vertices during the filtering candidate vertex stage,and the intermediate results of redundancy are reduced by delaying the Cartesian product.Accelerate the merge speed with auxiliary candidate vertices in the combining intermediate result phase.Finally,a comparative experiment was designed and real data sets verification was performed on a Spark distributed cluster with 7 compute nodes.The experimental data fully demonstrates that the data transmission cost and the number of intermediate results in the distributed system are the main factors affecting the efficiency of distributed subgraph matching algorithm.Realizing the preprocessing of the data graph and delaying the Cartesian product solves the bottleneck problem of the performance of distributed subgraph matching and effectively completes the subgraph matching of large-scale graph data.
Keywords/Search Tags:Subgraph matching, Subgraph query, Graph partition, Subgraph isomorphism
PDF Full Text Request
Related items