Resource Description Framework(RDF)has been widely used to represent information on the Web,while SPARQL is a standard query language to manipulate RDF data.There are many join operations in a SPARQL query,which are the bottlenecks of efficiency of SPARQL query processing.In addition,real RDF datasets often show strong data sparsity,but the nature of the data itself is often overlooked at the data storage level.Moreover,query processing over RDF data on GPU is currently considered to be an important way to improve query efficiency.In this paper,we use GPU and the sparse nature of the RDF data itself to efficiently process SPARQL query processing and optimization.Firstly,we propose a sparse matrix-based storage for RDF data,which introduces a predicate-based hash index on the storage and improves the storage efficiency by storing only valid edges.Secondly,we present a sparse matrix-based SPARQL query optimization method and design a query plan generation algorithm,which fully considers the cost of query optimization during the join operation and data sparsity over RDF data,and analyzes the overall cost by accumulating all intermediate results that may be generated throughout the SPARQL query.Third,we develop a scalable sparse matrix-based JOIN algorithm of SPARQL query on GPU for parallelization to speed up the query efficiency.Finally,in order to illustrate the query acceleration performance of the JOIN algorithm on the GPU,we implement this JOIN algorithm as a benchmark in the CPU environment to verify the performance of the algorithm on the GPU.The experimental results show that compared with the existing RDF engines over benchmark RDF datasets,our approach can significantly improve the efficiency of SPARQL query processing and has high scalability.Moreover,the JOIN on GPU has an acceleration ratio of approximately 7 times compared to the CPU.In summary,we focus on the characteristics of RDF data to build an efficient storage mode.In terms of query processing,we transform the traditional relational JOIN operations into matrix operations that can be used for parallel computing by the GPU platform,which introduces a new solution for efficient query of RDF data. |