An Analytical System For Large Scale Semantic Data

Posted on:2014-02-02

Degree:Master

Type:Thesis

Country:China

Candidate:J H Du

Full Text:PDF

GTID:2248330392960488

Subject:Computer application technology

Abstract/Summary:

PDF Full Text Request

With the development of Semantic Web, more data are expressed in theform of RDF triples. The problem of storing, querying and analyzing thislarge-scale semantic data has become a hot topic.Traditional triple stores deployed on a single machine have been provedto be effective to provide storage and retrieval of RDF data. However, thescalability is limited and cannot handle billion ever growing triples. On theother hand, Hadoop is an open-source project which provides HDFS as adistributed file storage system and MapReduce as a computing framework fordistributed processing. It has proved to perform well for large data analysis.In the resent years, more and more research work has been on thecombining of Hadoop and storing or retrieving RDF data. For example, useMapReduce to do semantic inference or answer simple SPARQL queries. Inthis paper, we propose HadoopRDF, a system to combine both worlds (triplestores and Hadoop) to provide a scalable data analysis service for the RDFdata. A Hadoop cluster is built as the basic infrastructure, and each data node in the cluster has a state of the art triple store installed. Firstly, the originaldataset art partitioned to data fragments of the similar size and each fragmentis distributed to one node. When SPARQL is executed, the query also shouldbe partitioned in the same schema. Each query part is only related to the datain one special node in the cluster according to the schema. HadoopRDForganizes all the triple stores in the cluster, distributes the SPARQL query partto the special node in the cluster, and merges the results returned from theeach triple store to realize the execution of the analysis retrieval of the dataset.Some analytical operations like count numbers are also designed to gainvaluable information from the semantic dataset.HadoopRDF benefits the scalability of Hadoop and the ability to supportflexible analysis query like SPARQL of traditional triple stores. Experimentshave been done on BSBM, which is a standard RDF dataset for testing thestorage and retrieval of RDF data. Experimental evaluation results show theeffectiveness and efficiency of the approach against the contrasting methods.

Keywords/Search Tags:

Large Scale, Distributed Storage and Retrieval, SPARQL, SemanticData Analysis

PDF Full Text Request

Related items

1	Enabling large-scale storage and retrieval of whole slide images: A big data approach
2	Large Scale Video Retrieval And Feedback With Multi-level Content Represeentation
3	The Research On Large-Scale Distributed Storage Technology
4	Research On Large-scale RDF Data Query Method Based On Graph Clustering
5	A High-performance Distributed Storage System For Large-scale High-definition Video Data
6	Research On Distributed Storage And Retrieval Technology Of Large-scale Knowledge Graph
7	Hash-based Search Over Large-scale Police Facial Images
8	Research On The Data Redundancy Technique For The Large-scale And Distributed Storage Systems
9	The Design And Implement Of A Large-scale Key-value Distributed Storage System
10	Research On Technology Of Content-Based Large-Scale Image Retrieval