Font Size: a A A

Storage of large RDF graphs using Hadoop and their retrieval using pig

Posted on:2011-04-18Degree:M.SType:Thesis
University:The University of Texas at DallasCandidate:Doshi, PankilFull Text:PDF
GTID:2448390002968321Subject:Information Technology
Abstract/Summary:
Semantic Web and Cloud computing are the newest technologies and hence many researchers focus on these technologies. Analyzing huge amount of data has always been a concern and this is also true for Semantic Web data. Current Semantic Web frameworks lack scalability. RDF (Resource Description Framework) is a standard model for data interchange on the Web and is standardized by World Wide Web (W3C). Current frameworks do not scale large RDF graphs. So, significant challenge arises for storing and querying large RDF graphs. This thesis work is focused on pre-processing and storing large RDF graphs in such a away that querying becomes more simple. We describe our preprocessing framework build using Hadoop to pre-process, store large RDF graphs exploiting cloud-computing paradigm. We use Hadoop's MapReduce programming model and software framework to pre-process large RDF graphs and store them into Hadoop's Distributed File System (HDFS). Querying the data stored into HDFS for information retrieval is done by exploiting the capabilities open source Pig Platform. Pig Latin is the high level procedural language for processing large scale structured data using Hadoop MapReduce Platform. We manually convert a given SPARQL query into Pig script and run it over the preprocessed data.;Keywords: Hadoop, Pig, SPARQL, Semantic Web, large RDF graphs...
Keywords/Search Tags:Large RDF graphs, Semantic web, Hadoop
Related items