Font Size: a A A

Research On Distributed Query Processing And Optimization Of RDF Data

Posted on:2019-09-27Degree:MasterType:Thesis
Country:ChinaCandidate:H QiuFull Text:PDF
GTID:2428330566998097Subject:Computer Science and Technology
Abstract/Summary:PDF Full Text Request
RDF has become more and more popular due to its flexible and universal graph-like data model,and the size of RDF data is increasing in a rapid speed.It becomes infeasible to store and process RDF data on a single machine,which raises the requirement for the distributed parallel approaches.Due to this background,in this paper we focus on problems of distributed RDF data query processing and optimization.The main content of the research work of this paper can be summarized as follows.Firstly,We propose a novel method called SQX for evaluating SPARQL queries on Spark Graph X.By treating RDF data as a large property graph,execute SPARQL queries in a graph-parallel way.The SQX method adopts “query tree matching” + “result filtering” approach,generates a query tree and non-tree edges,the query tree tree adopts a bottom-up and level by level way,through which multiple query edges in the tree can be matched in one superstep,the final result will be generated form the iteration result which satisfied the non-tree edges.Besides finish the processing of basic SPARQL queries,we implement more support for SPARQL keywords such as Filter,Optional,Union and so on.Secondly,on the basic of SQX,we propose a novel distributed SPARQL query optimization algorithm based on Pregel graph processing model.Different triple execution orders correspond to different data transmission costs,query optimization can select the optimal iteration order for the query and shorten the execution time.We assign a reasionable weight to each edge of the query tree using a statistical edge weight allocation strategy and update the weight of the edge based on the predicate pair co-occurrence and estimate the query cost,so as to choose the optimal query plan.With the same number of iterations,the least expensive execution plan will be used as the final query evaluation plan.Thirdly,the SPARQL approximate query function is implemented.When the user does not have enough understanding of the underlying knowledge base,the SPARQL queries they write may not return correct results.We proposed an approximate query scheme based on word embedding,which can construct query most similar to the user.Finally,we implement all the algorithm proposed in this paper.At the same time,extensive experiments had been done to verify the performance.The results show that the algorithm has good query efficiency and optimization preformance.
Keywords/Search Tags:RDF, SPARQL, Spark Graph X, query processing, query optimization, approximate query
PDF Full Text Request
Related items