Font Size: a A A

I/O-Efficient Data-Intensive Computing

Posted on:2014-01-16Degree:Ph.DType:Dissertation
University:University of California, San DiegoCandidate:Rasmussen, Alexander CarlinFull Text:PDF
GTID:1458390005989932Subject:Computer Science
Abstract/Summary:
There is a growing need for scalable, data-intensive processing platforms to analyze and filter large volumes of data. The effectiveness of these systems is measured by the quantity and quality of data that they can process in a reasonable amount of time; thus, these systems have very high I/O and storage requirements.;Existing systems are very effective at scaling to large cluster sizes. Unfortunately, there exists a significant gap between the performance these systems provide and the underlying capacity of the hardware infrastructure on which they are deployed.;In this dissertation, we endeavor to bridge this performance gap by focusing on efficient I/O as a first-class architectural concern. In particular, we present two systems, TritonSort and Themis. TritonSort is a high-performance large-scale sorting system capable of sorting 100TB of data on a modestly-sized cluster at about 82% of that cluster's peak hardware performance. Themis is a successor system to TritonSort that supports the popular MapReduce programming paradigm and can run a wide spectrum of MapReduce jobs at nearly the speed at which TritonSort can sort. We conclude with the implementation of a fault tolerance scheme for Themis that provides proportional fault tolerance without imposing additional rounds of I/O during common-case operation.
Keywords/Search Tags:I/O, Data
Related items