Font Size: a A A

The Research Of Distributed Regular Path Queries Over Large RDF Knowledge Graphs

Posted on:2019-04-03Degree:MasterType:Thesis
Country:ChinaCandidate:Y Q XinFull Text:PDF
GTID:2428330626452089Subject:Computer Science and Technology
Abstract/Summary:PDF Full Text Request
With the increasing popularity of knowledge graphs,Resource Description Framework(RDF)has been widely recognized as a flexible graph-like data model to represent large-scale knowledge bases.It has become essential to realize efficient and scalable query processing for big RDF knowledge graphs in various domains,such as social networking,bioinformatics and government public opinion.As one of the fundamental operations for querying graph data,regular path queries(RPQs)can explore RDF graphs in a navigational manner,which have been attracting increasing research efforts.However,the existing query processing approaches mainly focus on the standard semantics of RPQs,which cannot provide provenance of the answer sets.This is not conducive to user observation and analysis.In this paper,we propose a Pregel-based approach to processing regular paths queries under provenance-aware semantics,called P3 RPQ,for distributed large-scale RDF knowledge graphs.A regular path query is transformed into an automaton,then the vertices in an RDF graph and the states in the automaton are matched.Our algorithm can make full use of the vertex-centric Pregel parallel computing model.In each superstep,vertex-state matching pairs are constructed at each active vertex in-parallel.Then each active vertex sends messages to the adjacency vertices to generate intermediate result paths in a navigational manner.At the same time,we analyze the cost of the proposed method and design the optimization strategy according to the cost model.In this paper,candidate-state precomputation and edge-filtering are designed to reduce the overhead of vertex computation.Further,sending-messages pruning and encoding messages based on variable length bytes are designed to reduce the scale of intermediate results and communication overhead.We also design message selection and message compression strategy to overcome the counting-paths problem to some extent.We design various regular path queries,and the proposed method and optimization strategy are verified by extensive experiments on both synthetic and real-world datasets.The experimental results verify the effectiveness,efficiency and scalability of the method and strategy,and prove that P3 RPQ can efficiently answer provenance-aware RPQs over large RDF knowledge graphs.
Keywords/Search Tags:Regular path query, Provenance-aware, RDF knowledge graphs, Pregel
PDF Full Text Request
Related items