Font Size: a A A

Research On A Hashing Index Based RDF Data Storage And Query System And Its Application

Posted on:2019-02-20Degree:MasterType:Thesis
Country:ChinaCandidate:W W LiFull Text:PDF
GTID:2428330626952103Subject:Computer technology
Abstract/Summary:PDF Full Text Request
In the past decade,the volume of RDF(“Resource Description Framework”)data,which is a widely used World Wide Web Consortium standard,has grown enormously,and many RDF datasets have reached up to billions of triples.As a result,how to efficiently manage this huge RDF data has become a tremendous challenge.Although several systems,such as gStore and RDF-3x,have been proposed to support RDF store and SPARQL query,the lengthy comparison operations and high collision rate with data explosion have become the Achilles heel of an RDF data management system.In this paper,we present HTStore,a hashing index based system for fast storing and accessing large-scale RDF data.First,HTStore utilizes the hash functions to significantly reduce the query time.The index structure includes two layers: the hash layer containing a hash table and the tree layer containing hash trees.In addition,to ensure the efficient data exchange between memory and disk,we also leverage effective pruning rules and efficient search algorithms in hash tree index.Counting Bloom Filter is also proposed in our system to prejudge whether the triple exists in the dataset or not before accessing the entire datasets.The adjustment strategy for the index can reduce the number of accessing the disk and improve the query performance further.Finally,we apply the HTStore system to biomedical data.We combined multiple data sets including genetic data,drug data and disease data to form a multi-source RDF dataset.Then we put the data set into the HTStore system.Extensive experiments confirm the efficiency and effectiveness of RDF storage and SPARQL queries in our solutions.The experimental results demonstrate that the proposed system can improve the query efficiency up to 22% compared with the representative RDF data management systems,such as QLserver,TripleBit,gStore,MonetDB.The execution time of updating operations can be reduced by 25%.We designed serval queries between gene,drug and disease,which has practical application significance.
Keywords/Search Tags:RDF Storage, SPARQL Query, Hash Tree Index, Bloom Filter, Biomedical Data
PDF Full Text Request
Related items