Research And Implementation Of Storage And Query Techniques On Massive RDF Data

Posted on:2014-01-10

Degree:Master

Type:Thesis

Country:China

Candidate:J C Song

Full Text:PDF

GTID:2268330392973422

Subject:Computer Science and Technology

Abstract/Summary:

PDF Full Text Request

With the growing size of data on the Internet, it’s becoming increasingly difficultfor people to access information accurately and quickly. Through adding semanticsupport to the Web, Semantic Web enables the machine to understand the meaning ofthe data and help people access information quickly. RDF (Resource DescriptionFramework, Resource Description Framework) is the standard data model forexchanging data in Semantic Web, which describes semantic data in triples like<subject, predicate, object>. With the continuous improvement and widespread use ofthe Semantic Web Technology, RDF data szie increases sharply. The emergence ofmassive RDF data brings great challenges to RDF management system.It become aresearch hostspot in the field of semantic web to research on how to construct ascalability RDF storage and query system.MapReduce is an emerging parallel programming technique for processingmassive data in recent years. Hadoop is a famous open source implementation tools ofMapReduce.Using Hadoop platform to process massive data has been widespreadconcerned. Through studying the principle of RDF storage and query as well as thecharacteristics of HBase, this paper analyses the advantages of using HBase to storageRDF and gives a design scheme of RDF storage system based on HBase, includingthe fllowing:1. According to the characteristics of RDF storage and query and HBase’s indexmechanism as well as the feature of HBase Rowkey sorted by dictionary, this paperdesigned the storage model of RDF in HBase.2. Aiming at the problem of loading massive RDF data, this paper presents aparallel loading algorithm based on MapReduce to achieve fast data loadingcapabilities.3. According to RDF’s storage mode in HBase, this paper designed the queryresponse strategy for diferent kind of Triple Pattern.This paper use MapReduceiteration to process the basic graph pattern query,and presents an MapReduce-basedjoin method which use HBase region as datasource.We build a RDF storage system based on HBase and use benchmarks to test andanalyze its load and query performance, experimental results showed that usingHBase to achieve massive RDF data’s storage and query system is an effectivealternative solution.

Keywords/Search Tags:

Semantic Web, RDF, distributed storage, HBase, MapReduce

PDF Full Text Request

Related items

1	Research And Design Of RDF Storage System Based On HBase
2	Research And Application Of Distributed Storage Technology Based On Semantic Metadata
3	Research And Implementation Of Large Collections Of RDF Data Storage And Retrieval Technology On HBase
4	Research And Implementation Of Large Collections Of Rdf Data Distributed Storage On Domain Ontology
5	A Research Of Distributed Storage And Parallel Query Of Spatial Data Based On Hadoop Platform
6	Research On GNSS Data Storage And Retrieval Based On HBASE
7	Research On Radar Data Storage And Analysis Processing Technology
8	Research Of Industrial Internet Of Things Data Storage Strategy Based On HBase
9	Ontology Storage And Query Based On HBase
10	Research On RDF Data Storage And Query Based On HBase