Font Size: a A A

Scalable parallel computing on clouds: Efficient and scalable architectures to perform pleasingly parallel, MapReduce and iterative data intensive computations on cloud environments

Posted on:2015-03-07Degree:Ph.DType:Thesis
University:Indiana UniversityCandidate:Gunarathne, ThilinaFull Text:PDF
GTID:2478390017498931Subject:Computer Science
Abstract/Summary:
Over the last decade, three major disruptive trends driven by the software industry altered the scalable parallel computing landscape. These disruptions are the data deluge (i.e., shift to data-intensive from compute-intensive), next generation compute and storage frameworks based on MapReduce, and the utility computing model introduced by cloud computing environments. This thesis focuses on the intersection of these three disruptions and evaluates the feasibility of using cloud computing environments to perform large-scale, data-intensive computations using next-generation programming and execution frameworks. The current key challenges for performing scalable parallel computing in cloud environments include identifying suitable application patterns, identifying efficient and easy-to-use programing abstractions to represent those patterns, performing appropriate task partitioning and task scheduling, identifying suitable data storage and staging architectures, utilizing suitable communication patterns, and identifying appropriate fault tolerance mechanisms.;This thesis will identify three types of application patterns that are well suited for cloud environments. Presented first are pleasingly parallel computations, including pleasingly parallel programming frameworks for cloud environments. Secondly, MapReduce-type applications are explored, including a decentralized architecture and a prototype implementation to develop MapReduce frameworks using cloud infrastructure services. Third and finally, data-intensive iterative applications, which encompass many graph processing algorithms, machine-learning algorithms, and more, are considered. We present the Twister4Azure architecture and runtime as a solution for implementation of data-intensive iterative applications in cloud environments. Twister4Azure architecture extends the familiar, easy-to-use MapReduce programming model with iterative extensions and iterative specific optimizations, enabling a wide array of large-scale iterative and non-iterative data analysis and scientific applications to utilize cloud platforms easily and efficiently in a fault-tolerant manner.;Collective communication operations facilitate the optimized communication and coordination between groups of nodes of distributed computations, which leads to many advantages. We also present the applicability of collective communication operations to the iterative MapReduce computations on cloud and cluster environments, enriching these computations with additional application patterns without sacrificing the desirable properties of the MapReduce model. The addition of collective communication operations enhances the iterative MapReduce model by offering many performance improvements and ease-of-use advantages.
Keywords/Search Tags:Scalable parallel computing, Iterative, Mapreduce, Cloud, Collective communication operations, Computations, Data, Architecture
Related items