Font Size: a A A

Research On RDF Data Storage And Query Based On HBase

Posted on:2014-02-05Degree:MasterType:Thesis
Country:ChinaCandidate:M ZhuFull Text:PDF
GTID:2248330395495483Subject:Computer application technology
Abstract/Summary:PDF Full Text Request
In recent years, with the development of Semantic Web technology, Resource Description Framework (RDF) which is used to describe Semantic Web resources is widely used in various fields leading to the rapid growth of RDF data, how to efficiently manage massive RDF data has become a key issue. Most of the existing RDF data management systems use relational database to store RDF data. This approach has been difficult to efficiently manage massive data. Studies have shown that relational database has lower efficiency than distributed database in dealing with massive RDF data. Therefore, more and more researchers use distributed systems and parallel computing techniques to manage the massive RDF data.Massive RDF data management includes two main aspects:First is how to effectively store massive amounts of RDF data. The other is how to efficiently query RDF data. In order to solve these two problems, this paper proposed a novel RDF data storage model based on distributed database HBase, and designed basic graph pattern query algorithms for this storage model.The main work is as follows:(1) Proposed an RDF data storage model based on the distributed database HBase.This model splits data according to classes which defined by Ontology which is described by OWL language. Triples that belong to the same class are stored both in the S_PO and O_PS tables of this class. This method makes full use of the Row-key index provided by HBase, and effectively reduces the storage overhead, at the same time, ensures the query performance.(2) Implemented SPARQL query and update operation based on the storage model using HBase Java API. Designed Triple Pattern and Basic Graph Pattern (BGP) query algorithms for this storage model which supporting some inferences. Optimized BGP algorithm according to selectivity of triple pattern, shared variables and the rdf:type property.(3) Verified the feasibility of the storage model and query algorithms in stand-alone pseudo-distributed Hadoop system and real distributed Hadoop cluster using LUBM test set. Proved the effectiveness of the method proposed in this paper compared with the existing storage model and query algorithm.
Keywords/Search Tags:RDF, semantic data storage, query processing, HBase
PDF Full Text Request
Related items