Query Processing And Optimization Of SPARQL Based On GPU

Posted on:2019-10-06

Degree:Master

Type:Thesis

Country:China

Candidate:M Y Zhang

Full Text:PDF

GTID:2428330626452408

Subject:Computer technology

Abstract/Summary:

PDF Full Text Request

Resource Description Framework(RDF)has been widely used to represent information on the Web,while SPARQL is a standard query language to manipulate RDF data.There are many join operations in a SPARQL query,which are the bottlenecks of efficiency of SPARQL query processing.In addition,real RDF datasets often show strong data sparsity,but the nature of the data itself is often overlooked at the data storage level.Moreover,query processing over RDF data on GPU is currently considered to be an important way to improve query efficiency.In this paper,we use GPU and the sparse nature of the RDF data itself to efficiently process SPARQL query processing and optimization.Firstly,we propose a sparse matrix-based storage for RDF data,which introduces a predicate-based hash index on the storage and improves the storage efficiency by storing only valid edges.Secondly,we present a sparse matrix-based SPARQL query optimization method and design a query plan generation algorithm,which fully considers the cost of query optimization during the join operation and data sparsity over RDF data,and analyzes the overall cost by accumulating all intermediate results that may be generated throughout the SPARQL query.Third,we develop a scalable sparse matrix-based JOIN algorithm of SPARQL query on GPU for parallelization to speed up the query efficiency.Finally,in order to illustrate the query acceleration performance of the JOIN algorithm on the GPU,we implement this JOIN algorithm as a benchmark in the CPU environment to verify the performance of the algorithm on the GPU.The experimental results show that compared with the existing RDF engines over benchmark RDF datasets,our approach can significantly improve the efficiency of SPARQL query processing and has high scalability.Moreover,the JOIN on GPU has an acceleration ratio of approximately 7 times compared to the CPU.In summary,we focus on the characteristics of RDF data to build an efficient storage mode.In terms of query processing,we transform the traditional relational JOIN operations into matrix operations that can be used for parallel computing by the GPU platform,which introduces a new solution for efficient query of RDF data.

Keywords/Search Tags:

Sparsity, SPARQL, Parallel Computing, GPU, RDF Data

PDF Full Text Request

Related items

1	Semantic EMR Data SPARQL Query Optimization Mechanisms
2	Research Of Massive Semantic Information Parallel Inference Method Based On Cloud Computing
3	Distributed Semantic Query Based On Sparql
4	Research On Key Problems And Technology In Personal Information Recommendation
5	A parallel computing paradigm for transcription network construction from microarray data
6	The Research And Implementation Of Diversity Demand Oriented Parallel Computing Model
7	Research And Application Of Multi-GPU Parallel Computing Based On OpenCL
8	Research On Optimization Of Map Reduce For Interactive Analysis On Big Data
9	The Design And Implementation Of SPARQL Based Semantic Web Data Retrieval System
10	Data Fusion System Design And Implementaiton Based On Parallel Computing Techniques