Research On Query And Retrieval Techniques On Distributed Knowledge Graph

Posted on:2022-09-27

Degree:Doctor

Type:Dissertation

Country:China

Candidate:Q Wang

Full Text:PDF

GTID:1488306731467204

Subject:Software engineering

Abstract/Summary:

PDF Full Text Request

Recently,Resource Description Framework RDF(Resource Data Framework)has been widely used in various applications to mark resources in the Web.In RDF model,facts in the real world can be represented as triples of the form <subject,predicate,object>.An RDF dataset can also be represented as a graph,knowledge graph,where subjects and objects are vertices and triples are edges with property names as edge labels.Now,as more and more applications publish their datasets in RDF model,the sizes of RDF datasets become larger and larger.Thus,how to utilize different kinds of distributed systems to query and retrieve large knowledge graphs become a hot and challenging research topic.Therefore,in this dissertation,we do research on how to handle different types of query and retrieval tasks in different kinds of distributed system.The distributed systems that we study include federated systems,partitioningbased systems.The types of query and retrieval tasks includes keyword search,structural query and distance query.The research work of this paper includes three parts:(1)Keyword Search over Federated RDF SystemsFor a federated knowledge graph system that consists of some SPARQL endpoints,we study the problem of keyword search.In the offline,we first merge the classes in different SPARQL endpoints to build up a schema graph.In the online phase,we utilize the full-text search interfaces provided by SPARQL endpoints to map keywords to the classes in the schema graph,and generates structural queries by exploring the schema graph.Then,we send the generated queries to the SPARQL endpoints and evaluate these queries.Theoretical analysis and experiment results show that our approaches are effective and efficient.(2)Partioning-based SPARQL Query OptimizationIn a distributed knowledge graph systems tightly coupling many centralized knowledge graph systems,to handle the structural queries,it is quite common to partition the knowledge graph into some parts which are then distributed.The main problem of the method is that there may be too many intermediate results during structural query evaluation.In this dissertation,to reduce the number of intermediate results and improve the query performance,we present a special optimization of assembling variables’ candidates in sites before selecting the distributed execution plan,and design a total running time-based model to estimate the benefit of the optimization..Experiments over large RDF datasets confirm the effectiveness of our optimization technique and our optimization can be seamlessly combined with the existing distributed knowledge graph systems..(3)Optimizing Distance Computation based on LandmarksAs the sizes of knowledge graphs increase,the performances of traditional distributed distance computing methods can not meet the requirements of recent applications.We propose a landmark-based framework to optimize the distance queries over cloud-based distributed graph systems.We propose a measure called set betweenness to select the optimal set of landmarks for distance computation.Although we can prove that selecting the optimal set of landmarks is NP-hard,we propose a heuristic distributed algorithm that can guarantee the approximation ratio.Experiments on large knowledge graphs show that our methods outperform existing methods and the performances over different cloud-based distributed graph systems differ greatly.The performances in Pregel+ are double or treble the performances in Giraph and Graph X.

Keywords/Search Tags:

Federated RDF System, Keyword Search, Distributed RDF Systems, SPARQL Query, Shortest Path Query

PDF Full Text Request

Related items

1	Research On Complex Query Processing And Optimization Method In Federated Distributed RDF System
2	SPARQL Federated Query And Its Application On The Semantic Web
3	Keyword Query For RDF Data Based On Query Translation
4	Shortest Distance Query And Spatial Keyword Query Based On Key Nodes On Road Networks
5	Study On Shortest Path Query On Road Networks Using Graph Partitioning Methods
6	Approximate Query Method Based On Relational Database Keyword Semantic Research
7	Research On Keyword Search Based On Knowledge Graph In Federated RDF System
8	Research On SPARQL Query Engine Across Different Storage Platform
9	Research On Distributed RDF Query Processing
10	Research On Distributed Query Processing And Optimization Of RDF Data