Research On RDF Data Storage And Query Based On HBase

Posted on:2014-02-05

Degree:Master

Type:Thesis

Country:China

Candidate:M Zhu

Full Text:PDF

GTID:2248330395495483

Subject:Computer application technology

Abstract/Summary:

PDF Full Text Request

In recent years, with the development of Semantic Web technology, Resource Description Framework (RDF) which is used to describe Semantic Web resources is widely used in various fields leading to the rapid growth of RDF data, how to efficiently manage massive RDF data has become a key issue. Most of the existing RDF data management systems use relational database to store RDF data. This approach has been difficult to efficiently manage massive data. Studies have shown that relational database has lower efficiency than distributed database in dealing with massive RDF data. Therefore, more and more researchers use distributed systems and parallel computing techniques to manage the massive RDF data.Massive RDF data management includes two main aspects:First is how to effectively store massive amounts of RDF data. The other is how to efficiently query RDF data. In order to solve these two problems, this paper proposed a novel RDF data storage model based on distributed database HBase, and designed basic graph pattern query algorithms for this storage model.The main work is as follows:(1) Proposed an RDF data storage model based on the distributed database HBase.This model splits data according to classes which defined by Ontology which is described by OWL language. Triples that belong to the same class are stored both in the S_PO and O_PS tables of this class. This method makes full use of the Row-key index provided by HBase, and effectively reduces the storage overhead, at the same time, ensures the query performance.(2) Implemented SPARQL query and update operation based on the storage model using HBase Java API. Designed Triple Pattern and Basic Graph Pattern (BGP) query algorithms for this storage model which supporting some inferences. Optimized BGP algorithm according to selectivity of triple pattern, shared variables and the rdf:type property.(3) Verified the feasibility of the storage model and query algorithms in stand-alone pseudo-distributed Hadoop system and real distributed Hadoop cluster using LUBM test set. Proved the effectiveness of the method proposed in this paper compared with the existing storage model and query algorithm.

Keywords/Search Tags:

RDF, semantic data storage, query processing, HBase

PDF Full Text Request

Related items

1	Research And Implementation Of Large Collections Of RDF Data Storage And Retrieval Technology On HBase
2	Research And Implementation Of Storage And Query Techniques On Massive RDF Data
3	Research And Application Of The Storage Of Hbase
4	Research On Storage And Query Processing Of Spatio-temporal Data Based On HBase
5	Ontology Storage And Query Based On HBase
6	Research On Query Optimization With Storage Model For HBase
7	Research Of Big Data Store Query Technology Based On HBase
8	The Research On Big Graph Data Management Based On HBase
9	The Research And Implementation Of Indexing And Query Techniques Based On HBase And In-memory Database
10	Study On Large-Scale Semantic Data Storage And Query