Font Size: a A A

The Study Of Graph Querying Algorithms Based On Spark

Posted on:2018-08-16Degree:MasterType:Thesis
Country:ChinaCandidate:Y J ZhangFull Text:PDF
GTID:2348330512475596Subject:Computer Science and Technology
Abstract/Summary:PDF Full Text Request
As an important data structure,graph has a strong power of expression which can describe a lot of network problems in reality.And as the growth of the scale of data in the internet,graphs become more and more large.So how to search for the information effectively in a large data graph is one of the hot research problems now.Subgraph isomorphism algorithm is the key of data graph querying.Subgraph isomorphism problem have been proved to be an NP-complete problem,which cannot be found a polynomial time algorithm to deal with.In order to complete a querying under a large scale of data graph in an acceptable period of time,the ideal solution is to design an efficient algorithm and improve the performance of processing platform at the same time.To improve the performance of the subgraph isomorphism querying algorithm need to consider from the four aspects which are the indexing strategy,the pruning rule,the candidate set choosing strategy and the matching rule,in an attempt to improve the algorithm's efficiency from the parts to the overall.For the situations above,this paper carried on the multiple aspects of the work as follow.First of all,this paper studied and designed a vertex-neighbor-based indexing structure named VNIndex.The main principle of this indexing is to make a separate connection for each vertex with its neighbors,and get the vertex and edge information which is organized to be an indexing data with the vertex itself as the leading.This indexing can compress the data graph and query efficiently under the smallest indexing scale.Secondly,this paper designed an exact subgraph isomorphism querying algorithm called VNQuery based on VNIndex.This paper designed a pruning strategy combining pre-pruning and pruning in matching process which detailed the pruning rules and improved the effect of pruning process.Besides,this paper designed and implemented an improved candidate set choosing strategy with a querying sequence based on neighbor extension,which reduced the scale of candidate set and accelerated the querying speed.And thirdly,this paper applied an improved vertex matching rules with four kinds and seven methods to complete the confirmation,and meanwhile for the fine-grained pruning.This paper implemented the algorithm on the Spark distributed platform based on the automatic graph cut characteristics and storage of GraphX model,and made experiments on two data sets.The experimental results showed that the VNQuery algorithm designed in this paper had an obvious efficiency advantage on large data sets.
Keywords/Search Tags:Graph Querying, Subgraph Isomorphism, Index, Candidate Set, Pruning-Validation, Spark
PDF Full Text Request
Related items