Efficient SPARQL Theta Join Processing On Large Scale RDF Graphs

Posted on:2017-02-16

Degree:Master

Type:Thesis

Country:China

Candidate:T Wang

Full Text:PDF

GTID:2348330503989889

Subject:Computer software and theory

Abstract/Summary:

PDF Full Text Request

RDF(Resource Description Framework) is used to represent any information, ranging from structured data to unstructured data. Continuous explosion of RDF data from various fields has attracted a number of researchers as well as many commercial companies to create an efficient RDF stores which can be used in data analytics. And Theta Join queries are commonly employed in real practices. The language about SPARQL Theta join was not defined until 2013. Although some researches on RDF process Theta join on key-value stores or RDBMS, few RDF stores can process SPARQL Theta join. Processing Theta join on RDBMS or key-value stores is costly due to no RDF-native optimization, and the effective solution of processing Theta join queries on unstructured graph directly has never been fully explored. The challenge is that large RDF graph is schema free and processing SPARQL Theta Join queries is similar with cartesian product.The ThetaStore aims at providing an approach to efficiently process Theta Join and equi-join in an uniform way. In the approach, we first assign vertices of RDF graphs order-preserved IDs so that we do not need to translate a Theta join into a set of equi-joins. Second, the SPARQL queries are decomposed into star-shaped queries and query plans are generated for each sub-queries. By this way, the cost to generate query plans is cheaper than generating a full plan. Finally, we process theta joins concurrently in order to improve the performance. When processing queries on large graphs, the intermediate results may be very large. It degrades the performance. Thus, a strategy to propagate constraints quickly to adjacent sub-queries is used. We implemented the system ThetaStore which can process theta join queries on large RDF graphs.Our experimental results show that ThetaStore outperforms state-of-the-art systems in terms of query response time. On the one hand, ThetaStore process ThetaJoin queries on encoding IDs directly and avoid translating theta join into a large of equi-joins. On the other hand, ThetaStore uses parallelization and optimization to keep intermediated results as small as possible.

Keywords/Search Tags:

Resource Description Framework, Theta Join, SPARQL, Query processing

PDF Full Text Request

Related items

1	The Research Of Distributed RDF Data Processing Architecture
2	Cooperative Query Processing On Heterogeneous Processors
3	Research Of Federated Query Method For Linked Data Based On Semi-join
4	Query Processing And Optimization Of SPARQL Based On GPU
5	Join Prpcessing And Optimizing On Large Clusters
6	Hadoop Based Efficient Join Algorithm Research On GPU
7	Processing Theta-Joins on Shared-Nothing Systems
8	Reseach On Optimizing Top-k Join Queries Based On SPARQL-RANK
9	Temporal RDF Query Language And Its Transformation To TSQL2
10	Research On SPARQL Query Engine Across Different Storage Platform