Research On Distributed Query Processing And Optimization Of RDF Data

Posted on:2019-09-27

Degree:Master

Type:Thesis

Country:China

Candidate:H Qiu

Full Text:PDF

GTID:2428330566998097

Subject:Computer Science and Technology

Abstract/Summary:

PDF Full Text Request

RDF has become more and more popular due to its flexible and universal graph-like data model,and the size of RDF data is increasing in a rapid speed.It becomes infeasible to store and process RDF data on a single machine,which raises the requirement for the distributed parallel approaches.Due to this background,in this paper we focus on problems of distributed RDF data query processing and optimization.The main content of the research work of this paper can be summarized as follows.Firstly,We propose a novel method called SQX for evaluating SPARQL queries on Spark Graph X.By treating RDF data as a large property graph,execute SPARQL queries in a graph-parallel way.The SQX method adopts �query tree matching� + �result filtering� approach,generates a query tree and non-tree edges,the query tree tree adopts a bottom-up and level by level way,through which multiple query edges in the tree can be matched in one superstep,the final result will be generated form the iteration result which satisfied the non-tree edges.Besides finish the processing of basic SPARQL queries,we implement more support for SPARQL keywords such as Filter,Optional,Union and so on.Secondly,on the basic of SQX,we propose a novel distributed SPARQL query optimization algorithm based on Pregel graph processing model.Different triple execution orders correspond to different data transmission costs,query optimization can select the optimal iteration order for the query and shorten the execution time.We assign a reasionable weight to each edge of the query tree using a statistical edge weight allocation strategy and update the weight of the edge based on the predicate pair co-occurrence and estimate the query cost,so as to choose the optimal query plan.With the same number of iterations,the least expensive execution plan will be used as the final query evaluation plan.Thirdly,the SPARQL approximate query function is implemented.When the user does not have enough understanding of the underlying knowledge base,the SPARQL queries they write may not return correct results.We proposed an approximate query scheme based on word embedding,which can construct query most similar to the user.Finally,we implement all the algorithm proposed in this paper.At the same time,extensive experiments had been done to verify the performance.The results show that the algorithm has good query efficiency and optimization preformance.

Keywords/Search Tags:

RDF, SPARQL, Spark Graph X, query processing, query optimization, approximate query

PDF Full Text Request

Related items

1	The Research On Structured Query Generation Framework Based On Semantic Query Graph
2	Research On Query Processing Technologies Over Large Scale Knowledge Graphs
3	Exact And Approximate Map Data Query Query Research
4	Research On SPARQL Query Engine Across Different Storage Platform
5	Research On Large-scale RDF Data Query Algorithm Based On Graph
6	Unified SPARQL Query-analyzing Language Design And Theoretical Research
7	Research Of Approximate Query Processing Technology For Large Scale Data
8	Keyword Query For RDF Data Based On Query Translation
9	Research On The Top-k Query Processing Optimization Algorithms For The Sensor Networks
10	Research On Query And Retrieval Techniques On Distributed Knowledge Graph