Font Size: a A A

Research On Query And Retrieval Techniques On Distributed Knowledge Graph

Posted on:2022-09-27Degree:DoctorType:Dissertation
Country:ChinaCandidate:Q WangFull Text:PDF
GTID:1488306731467204Subject:Software engineering
Abstract/Summary:PDF Full Text Request
Recently,Resource Description Framework RDF(Resource Data Framework)has been widely used in various applications to mark resources in the Web.In RDF model,facts in the real world can be represented as triples of the form <subject,predicate,object>.An RDF dataset can also be represented as a graph,knowledge graph,where subjects and objects are vertices and triples are edges with property names as edge labels.Now,as more and more applications publish their datasets in RDF model,the sizes of RDF datasets become larger and larger.Thus,how to utilize different kinds of distributed systems to query and retrieve large knowledge graphs become a hot and challenging research topic.Therefore,in this dissertation,we do research on how to handle different types of query and retrieval tasks in different kinds of distributed system.The distributed systems that we study include federated systems,partitioningbased systems.The types of query and retrieval tasks includes keyword search,structural query and distance query.The research work of this paper includes three parts:(1)Keyword Search over Federated RDF SystemsFor a federated knowledge graph system that consists of some SPARQL endpoints,we study the problem of keyword search.In the offline,we first merge the classes in different SPARQL endpoints to build up a schema graph.In the online phase,we utilize the full-text search interfaces provided by SPARQL endpoints to map keywords to the classes in the schema graph,and generates structural queries by exploring the schema graph.Then,we send the generated queries to the SPARQL endpoints and evaluate these queries.Theoretical analysis and experiment results show that our approaches are effective and efficient.(2)Partioning-based SPARQL Query OptimizationIn a distributed knowledge graph systems tightly coupling many centralized knowledge graph systems,to handle the structural queries,it is quite common to partition the knowledge graph into some parts which are then distributed.The main problem of the method is that there may be too many intermediate results during structural query evaluation.In this dissertation,to reduce the number of intermediate results and improve the query performance,we present a special optimization of assembling variables' candidates in sites before selecting the distributed execution plan,and design a total running time-based model to estimate the benefit of the optimization..Experiments over large RDF datasets confirm the effectiveness of our optimization technique and our optimization can be seamlessly combined with the existing distributed knowledge graph systems..(3)Optimizing Distance Computation based on LandmarksAs the sizes of knowledge graphs increase,the performances of traditional distributed distance computing methods can not meet the requirements of recent applications.We propose a landmark-based framework to optimize the distance queries over cloud-based distributed graph systems.We propose a measure called set betweenness to select the optimal set of landmarks for distance computation.Although we can prove that selecting the optimal set of landmarks is NP-hard,we propose a heuristic distributed algorithm that can guarantee the approximation ratio.Experiments on large knowledge graphs show that our methods outperform existing methods and the performances over different cloud-based distributed graph systems differ greatly.The performances in Pregel+ are double or treble the performances in Giraph and Graph X.
Keywords/Search Tags:Federated RDF System, Keyword Search, Distributed RDF Systems, SPARQL Query, Shortest Path Query
PDF Full Text Request
Related items