Font Size: a A A

Efficient SPARQL Theta Join Processing On Large Scale RDF Graphs

Posted on:2017-02-16Degree:MasterType:Thesis
Country:ChinaCandidate:T WangFull Text:PDF
GTID:2348330503989889Subject:Computer software and theory
Abstract/Summary:PDF Full Text Request
RDF(Resource Description Framework) is used to represent any information, ranging from structured data to unstructured data. Continuous explosion of RDF data from various fields has attracted a number of researchers as well as many commercial companies to create an efficient RDF stores which can be used in data analytics. And Theta Join queries are commonly employed in real practices. The language about SPARQL Theta join was not defined until 2013. Although some researches on RDF process Theta join on key-value stores or RDBMS, few RDF stores can process SPARQL Theta join. Processing Theta join on RDBMS or key-value stores is costly due to no RDF-native optimization, and the effective solution of processing Theta join queries on unstructured graph directly has never been fully explored. The challenge is that large RDF graph is schema free and processing SPARQL Theta Join queries is similar with cartesian product.The ThetaStore aims at providing an approach to efficiently process Theta Join and equi-join in an uniform way. In the approach, we first assign vertices of RDF graphs order-preserved IDs so that we do not need to translate a Theta join into a set of equi-joins. Second, the SPARQL queries are decomposed into star-shaped queries and query plans are generated for each sub-queries. By this way, the cost to generate query plans is cheaper than generating a full plan. Finally, we process theta joins concurrently in order to improve the performance. When processing queries on large graphs, the intermediate results may be very large. It degrades the performance. Thus, a strategy to propagate constraints quickly to adjacent sub-queries is used. We implemented the system ThetaStore which can process theta join queries on large RDF graphs.Our experimental results show that ThetaStore outperforms state-of-the-art systems in terms of query response time. On the one hand, ThetaStore process ThetaJoin queries on encoding IDs directly and avoid translating theta join into a large of equi-joins. On the other hand, ThetaStore uses parallelization and optimization to keep intermediated results as small as possible.
Keywords/Search Tags:Resource Description Framework, Theta Join, SPARQL, Query processing
PDF Full Text Request
Related items