Font Size: a A A

Towards supporting big data computing using NoSQL systems and the mapreduce paradigm

Posted on:2015-11-13Degree:Ph.DType:Thesis
University:State University of New York at BinghamtonCandidate:Dede, ElifFull Text:PDF
GTID:2478390017489082Subject:Computer Science
Abstract/Summary:
The progressive transition in the nature of both scientific and industrial datasets has been the driving force behind the development and research interests in the NoSQL data stores. Loosely structured data poses a challenge to traditional database systems, and when working with the NoSQL model, these systems are often considered impractical and expensive. As the quantity of unstructured data grows, so does the demand for a processing pipeline that is capable of seamlessly combining the NoSQL storage model and a "Big Data" processing platform such as MapReduce. Although, MapReduce is the paradigm of choice for data-intensive computing, Java-based frameworks such as Hadoop require users to write MapReduce code in a specific programming language such as Java. In this thesis, we identify the limitations of running language independent applications with current off-the-shelf MapReduce frameworks and propose our own approach, MARISSA, to address these deficiencies. Later, we provide a performance study of using MapReduce to process datasets stored in various NoSQL systems. To begin with, we inspect the performance of traditional MapReduce workloads, implemented in Java, running on top of NoSQL stores. Similarly, we show that for legacy C/C++ applications and other language independent executables, there is a need to allow NoSQL data stores to exploit the assets of the MapReduce paradigm. Thus, we present the design details and performance implications of alternative approaches to tackle the challenge of integrating NoSQL data stores with MapReduce platforms providing support for language independent schemes. We present advantages and disadvantages of each approach and provide recommendations for individual workload and use case scenarios.
Keywords/Search Tags:Data, Mapreduce, Nosql, Systems
Related items