Research Of RDF Data Division And Storage Based On Hadoop

Posted on:2014-01-03

Degree:Master

Type:Thesis

Country:China

Candidate:J Cheng

Full Text:PDF

GTID:2248330395995483

Subject:Computer application technology

Abstract/Summary:

PDF Full Text Request

Semantic Web is an extension of the current World Wide Web. Semantic Web adds semantic information, which could be automatically identified by computer, for World Wide Web to promote work together between computer and people and achieve the automated processing of data, thereby improving the efficiency of information retrieval. But with the rapid growth of the Semantic Web data, the storage and retrieval of RDF data faces serious challenges. Fortunately, the Hadoop platform MapReduce parallel framework and HBase distributed database meet the requirement of massive data queries and storage. This paper researches RDF data storage and loading based on Hadoop platform, the main research work and achievements are as follows:(1) We design a RDF data storage solutions based on the OWL and use HBase as the storage medium. This solution uses HBase as the storage medium, and designs multi HTables to store RDF data based on OWL sematic information. Firstly, we design NOSClass and NOSProperty HTable to save OWL semantic information, to provide a basis for reasoning and query optimization operation. And then, we design S_PO and O_PS HTable for each class defined in the OWL file, to store the triples of this class. At last, we design NOSType and NOSInstance HTable to store the triples whose predicate is "rdf:type".(2) We design an efficient parallel parse, divide and load RDF data algorithm. We firstly take a MapReduce job to parse RDF data and divide the triples based on the class which the subject of a triple belongs to. And then, we translate the divided triple files into HFile files. Later, we use Bulk Load instruction to load the HFile files into HBase cluster. At last, we verify the effectiveness of the proposed parallel parsing and loading RDF data algorithm.(3) We design a hybrid SPARQL optimization algorithm based on selectivity estimation and triple pattern grouping. We firstly classify triple patterns into seven types with triple pattern grouping optimization, and then we sort the triple patterns in each type with selectivity estimation optimization. Eventually, we get the optimized query execution plan. At last, we verify the effectiveness of the hybrid method.

Keywords/Search Tags:

Semantic Web, OWL, RDF, parallel framework, selectivity estimation

PDF Full Text Request

Related items

1	Experimental evaluation of two selectivity estimation methods: Cosine and wavelet transform
2	Research And Implementation Of Large Scale Rule-Based Reasoning For The Semantic Web
3	Research Of Selectivity Estimation Algorithm For String Predicates Based On Modiifed PST
4	New Approaches To Selectivity Estimation In Database Optimization
5	The Study Of Selectivity Estimation Based On The Key Word Of Road Network
6	Research On XML Cluster Storage & Selectivity Estimation Of Path Expression
7	Distribution-density Based Histograms For Selectivity Estimation
8	Research On Selectivity Estimation Method In Spatial Database
9	SPARQL Query Optimization Based On Predicate Selectivity Estimation
10	Research On Semantic Web Application Framework Specialization:Semantic Web, Semantic Web Application