Font Size: a A A

Research And Design Of RDF Storage System Based On HBase

Posted on:2012-09-17Degree:MasterType:Thesis
Country:ChinaCandidate:Q JinFull Text:PDF
GTID:2178330332476029Subject:Computer application technology
Abstract/Summary:PDF Full Text Request
Since the rapid development of semantic web technologies, Resource Description Framework is widely used nowadays. However, the traditional centralized RDF stores have limitations in handling huge RDF datasets. To resolve the problem, distributed and parallel system are now be introducing into RDF storage system.In this paper, we researched on RDF storage system and proposed using HBase, which is a distributed column-oriented database, to store RDF datasets and using MapReduce to answer RDF queries.First, we introduced the background knowledge of modern RDF storage system, including the concept of RDF and standard RDF query language SPARQL. We then gave an overview of existing distributed RDF storage system and the current researches on integrating RDF store with Hadoop related technologies.Then, with a deep analytical understanding of RDF storage system, we proposed an approach to use HBase to store RDF dataset. RDF triples will be stored in three HBase tables, which are SPO POS and OSP. Our approach makes full use of the default index structure provided by HBase, which promised the respond time for query with reduced storage space.After that, we proposed a MapReduce strategy for handing SPARQL Basic Graph Pattern (BGP). We suggested that high selecting triple patterns and small intermediate results should be included in MapReduce job first. We proposed a greedy query plan generating strategy based on existing MapReduce multi-way joins researches. The evaluation result shows that our approach works well against large RDF dataset.Finally, we built a demo RDF storage system based on the proposed HBase RDF schema and MapReduce query answer strategy.
Keywords/Search Tags:RDF, Distributed system, HBase, MapReduce
PDF Full Text Request
Related items