Font Size: a A A

The Research Of Distributed Subgraph Matching Query On Big RDF Graph

Posted on:2019-05-10Degree:MasterType:Thesis
Country:ChinaCandidate:Q XuFull Text:PDF
GTID:2428330626952089Subject:Computer Science and Technology
Abstract/Summary:PDF Full Text Request
With the development of knowledge graphs,more and more data has been released in the form of the Resource Description Framework(RDF).RDF datasets often contain hundreds of millions of edges,which execeeds the processing capacity of single computer.Therefore,how to efficiently answer SPARQL queries in parallel over big RDF graphs has been widely recognized as a challenging problem.In this paper,we design two efficient distributed subgraph matching query methods on RDF graphs.The first one efficiently answer subgraph matching queries on RDF graphs using MapReduce,which decomposes the query graph into a set of stars by utilizing the semantic and structural information embedded RDF graphs as heuristics,so it can be matched in a few MapReduce iterations.Meanwhile,our paper gives a matching order of these stars to reduce intermediate results in MapReduce iterations.During the matching phase,each round of MapReduce adds one star with the join operation.In addition,our method reduces data input by encoding vertex neighborhood information with Bloom filter,and postpones Cartesian product to improve query performance.The other method processes subgraph match problem using Pregel,which transforms the query graph to a variant spanning tree based on the shortest paths,leverages the Pregel model process SPARQL queries.Two optimization techniques are proposed to improve the efficiency of our algorithms.One employs RDF propertis to filter out local computations and messages passed,the other postpones the Cartesian product operations in the matching process to reduce intermediate results.The extensive experiments on both synthetic dataset and real-world dataset are carried out.The experiments results demonstrate that our star decomposition-based method SDec and Pregel based method SP-Tree can both answer SPARQL BGP queries efficiently,which outperforms SHARD and S2 X by one order of magnitude.Finally,extensive experiments show that the performance of the optimization algorithms is improved significantly than the basic algorithm over both synthetic and real datasets.
Keywords/Search Tags:Big RDF graph, subgraph matching, MapReduce, Pregel
PDF Full Text Request
Related items