Font Size: a A A

Research And Implementation Of Storage And Query Techniques On Massive RDF Data

Posted on:2014-01-10Degree:MasterType:Thesis
Country:ChinaCandidate:J C SongFull Text:PDF
GTID:2268330392973422Subject:Computer Science and Technology
Abstract/Summary:PDF Full Text Request
With the growing size of data on the Internet, it’s becoming increasingly difficultfor people to access information accurately and quickly. Through adding semanticsupport to the Web, Semantic Web enables the machine to understand the meaning ofthe data and help people access information quickly. RDF (Resource DescriptionFramework, Resource Description Framework) is the standard data model forexchanging data in Semantic Web, which describes semantic data in triples like<subject, predicate, object>. With the continuous improvement and widespread use ofthe Semantic Web Technology, RDF data szie increases sharply. The emergence ofmassive RDF data brings great challenges to RDF management system.It become aresearch hostspot in the field of semantic web to research on how to construct ascalability RDF storage and query system.MapReduce is an emerging parallel programming technique for processingmassive data in recent years. Hadoop is a famous open source implementation tools ofMapReduce.Using Hadoop platform to process massive data has been widespreadconcerned. Through studying the principle of RDF storage and query as well as thecharacteristics of HBase, this paper analyses the advantages of using HBase to storageRDF and gives a design scheme of RDF storage system based on HBase, includingthe fllowing:1. According to the characteristics of RDF storage and query and HBase’s indexmechanism as well as the feature of HBase Rowkey sorted by dictionary, this paperdesigned the storage model of RDF in HBase.2. Aiming at the problem of loading massive RDF data, this paper presents aparallel loading algorithm based on MapReduce to achieve fast data loadingcapabilities.3. According to RDF’s storage mode in HBase, this paper designed the queryresponse strategy for diferent kind of Triple Pattern.This paper use MapReduceiteration to process the basic graph pattern query,and presents an MapReduce-basedjoin method which use HBase region as datasource.We build a RDF storage system based on HBase and use benchmarks to test andanalyze its load and query performance, experimental results showed that usingHBase to achieve massive RDF data’s storage and query system is an effectivealternative solution.
Keywords/Search Tags:Semantic Web, RDF, distributed storage, HBase, MapReduce
PDF Full Text Request
Related items