Storage of large RDF graphs using Hadoop and their retrieval using pig

Posted on:2011-04-18

Degree:M.S

Type:Thesis

University:The University of Texas at Dallas

Candidate:Doshi, Pankil

Full Text:PDF

GTID:2448390002968321

Subject:Information Technology

Abstract/Summary:

Semantic Web and Cloud computing are the newest technologies and hence many researchers focus on these technologies. Analyzing huge amount of data has always been a concern and this is also true for Semantic Web data. Current Semantic Web frameworks lack scalability. RDF (Resource Description Framework) is a standard model for data interchange on the Web and is standardized by World Wide Web (W3C). Current frameworks do not scale large RDF graphs. So, significant challenge arises for storing and querying large RDF graphs. This thesis work is focused on pre-processing and storing large RDF graphs in such a away that querying becomes more simple. We describe our preprocessing framework build using Hadoop to pre-process, store large RDF graphs exploiting cloud-computing paradigm. We use Hadoop's MapReduce programming model and software framework to pre-process large RDF graphs and store them into Hadoop's Distributed File System (HDFS). Querying the data stored into HDFS for information retrieval is done by exploiting the capabilities open source Pig Platform. Pig Latin is the high level procedural language for processing large scale structured data using Hadoop MapReduce Platform. We manually convert a given SPARQL query into Pig script and run it over the preprocessed data.;Keywords: Hadoop, Pig, SPARQL, Semantic Web, large RDF graphs...

Keywords/Search Tags:

Large RDF graphs, Semantic web, Hadoop

Related items

1	Research On Key Technologies Of Large-Scaled Semantic Web Onotologies Querying And Reasoning Based On Hadoop
2	Research On Semantic Similarity Measure Method For RDF Graphs
3	Research On System Of Multi-field Information Extraction Based On Semantic Role And Concept Graphs
4	Research On Large-scale Graphs Partitioning Algorithm Based On Structural Features
5	Semantic Search By Matching Conceptual Graphs
6	Research On Cross-Lingual Semantic Dependency Graphs
7	Research And Implementation On BSP-Based High Performance Iterative Computation For Large Graphs
8	An Improved Method Of Apriori Algorithm Based On Hadoop
9	Research Of Semantic Retrieval Methods Based On Conceptual Graphs
10	Research Reachability Query Method For Large-Scale Graphs