Font Size: a A A

An Analytical System For Large Scale Semantic Data

Posted on:2014-02-02Degree:MasterType:Thesis
Country:ChinaCandidate:J H DuFull Text:PDF
GTID:2248330392960488Subject:Computer application technology
Abstract/Summary:PDF Full Text Request
With the development of Semantic Web, more data are expressed in theform of RDF triples. The problem of storing, querying and analyzing thislarge-scale semantic data has become a hot topic.Traditional triple stores deployed on a single machine have been provedto be effective to provide storage and retrieval of RDF data. However, thescalability is limited and cannot handle billion ever growing triples. On theother hand, Hadoop is an open-source project which provides HDFS as adistributed file storage system and MapReduce as a computing framework fordistributed processing. It has proved to perform well for large data analysis.In the resent years, more and more research work has been on thecombining of Hadoop and storing or retrieving RDF data. For example, useMapReduce to do semantic inference or answer simple SPARQL queries. Inthis paper, we propose HadoopRDF, a system to combine both worlds (triplestores and Hadoop) to provide a scalable data analysis service for the RDFdata. A Hadoop cluster is built as the basic infrastructure, and each data node in the cluster has a state of the art triple store installed. Firstly, the originaldataset art partitioned to data fragments of the similar size and each fragmentis distributed to one node. When SPARQL is executed, the query also shouldbe partitioned in the same schema. Each query part is only related to the datain one special node in the cluster according to the schema. HadoopRDForganizes all the triple stores in the cluster, distributes the SPARQL query partto the special node in the cluster, and merges the results returned from theeach triple store to realize the execution of the analysis retrieval of the dataset.Some analytical operations like count numbers are also designed to gainvaluable information from the semantic dataset.HadoopRDF benefits the scalability of Hadoop and the ability to supportflexible analysis query like SPARQL of traditional triple stores. Experimentshave been done on BSBM, which is a standard RDF dataset for testing thestorage and retrieval of RDF data. Experimental evaluation results show theeffectiveness and efficiency of the approach against the contrasting methods.
Keywords/Search Tags:Large Scale, Distributed Storage and Retrieval, SPARQL, SemanticData Analysis
PDF Full Text Request
Related items