Font Size: a A A

Research And Implementation Of Distributed Algorithm For Large-Scale Subgraph Enumeration In Pregel

Posted on:2018-02-19Degree:MasterType:Thesis
Country:ChinaCandidate:X FengFull Text:PDF
GTID:2348330512998171Subject:Computer Science and Technology
Abstract/Summary:PDF Full Text Request
Subgraph enumeration,which is an algorithm to find all the subgraphs of a data graph that are isomorphic to a given query graph,is a fundamental graph problem with a wide range of Biochemistry,ecology,and social network analysis.However,With the rapid expansion of data graphs size,the traditional stand-alone subgraph enumeration algorithm has been unable to complete it which execute in the large-scale data graph.Thus,some recent researches focus on solving the distributed subgraph enumeration algorithm in distributed environmentsand and propose some solutions.The performance of the existing distributed subgraph enumeration algorithms mostly depends on the communication cost bwtween machines rather than the computing cost.Therefore,the amount of data transmission is the main factor for the performance of distributed subgraph enumeration algorithm.Many optimization strategies are proposed to minimize the number of intermediate results and reduce redundant calculations by indexing.However,these optimizations still carry out all intermediate results and cause the intermediate results expanding repaidly.This is the reason why the current algorithms cannot handle complex queries and large-scale intermediate results.In this paper,i propose PTSearch,a distributed subgraph enumeration algorithm,based on the existing distributed subgraph enumeration algorithm sand we discuss the deficiency of the related works.We implement our algorithm in Pregel,which is a popular large-scale graph processing system.PTSearch use compressed data structure to reduce the intermediate results transmitted to other machines.We make the following contributions in this paper:First,i design PTSearch,a distributed subgraph enumeration algorithm,which includes three steps:query decomposition,querying partial of matching result and expanding partial of matching result.Second,i develop a query decomposition algorithm,which based on dynameic-adjustment weight and greedy stategy to reduce the iteration number in querying partial of matching result.Third,i implement the process of querying partial matching result on Pregel and devise Gptr,which replaces partial matching result in existing algorithm.Gptr is extremely important data structures in our algorithm,which avoids transferring partial results in network to improve the performance of PTSearch.Fourth,i propose three optimization measures for PTSearch,including reduceing the number of iterations,the batch compression of Gptr and pruning strategy,which using automorphism and vertex degree information to reduced intermediate results.Fifth,i conduct extensive performance studies in four real-world graphs with different graph properties.As a result,PTSearch is 0.1%of PSgL and 33%of TwinTwigJoin on average in network data transmission.PTSearch outperforms PSgL and TwinTwig in the large-scale graphs when these soulutions fail on massive intermediate results.In summary,the PTSearch algorithm reduces the network traffic of the distributed subgraph enumeration and achieves better performance by compressing the intermediate results.In the 2016 Second National Contest on Application and Innovation of Cliud Computing,PTSearch algorithm won the first prize.
Keywords/Search Tags:Subgraph Enumeration, Distributed Algorithm, Graph Analysis, Pregel
PDF Full Text Request
Related items