Font Size: a A A

Large Scale Graph Processing Framework Based Distributed Subgraph Isomorphism

Posted on:2019-02-10Degree:MasterType:Thesis
Country:ChinaCandidate:Y LiuFull Text:PDF
GTID:2348330542998741Subject:Computer Science and Technology
Abstract/Summary:PDF Full Text Request
In the era of big data,a large number of computational problems are involved in the processing of large-scale graph structure data,such as querying and retrieval for the Internet,social networks,instant messaging networks and quiz networks.Subgraph isomorphism is one of the classic problems in the field of graph computation.Due to its computational complexity of NP-complete,it is more and more difficult to apply the traditional subgraph isomorphism algorithms performed on single machine to the growing large-scale graphs.To address the problems mentioned above,this paper studies the distributed subgraph isomorphism algorithm based on the large-scale graph parallel processing framework.Our main contributions are as follows:First,a query decomposition algorithm based on the eccentricity is proposed.In order to reduce the computation cost and traffic of the querying of subqueries,this paper proposes an eccentricity based query decomposition strategy.Compared with the traditional greedy decomposition method,this approach can decrease the number of subqueries in most cases and meanwhile,the subqueries has greater selectivity.Second,hop-separated label comparison based index construction of directed graphs.In order to reduce the size of candidate subgraph set of isomorphic matching,this paper proposes an index construction strategy for directed graphs.The index calculates and inspects the label statistics within every hop,which has greater pruning power.In addition,a label-based index loading optimization technique is proposed,which can reduce the memory usage of the index.Third,This paper proposes an intermediate query results merging technique based on partition distance.The merging of the intermediate results of the partitions brings in vast communication and computational overhead.In order to reduce this overhead,this paper checks the merging necessity of partitions by calculating their distance to avoid unnecessary communication and join operations.Based on extensive experiments on distributed clusters,it is verified that the architecture and algorithms proposed in this paper can improve the performance of the subgraph isomorphism on large-scale graphs to a certain extent.
Keywords/Search Tags:graph processing, graph decomposition, subgraph isomorphism, distributed computing
PDF Full Text Request
Related items