Scalable parallel computing on clouds: Efficient and scalable architectures to perform pleasingly parallel, MapReduce and iterative data intensive computations on cloud environments

Posted on:2015-03-07

Degree:Ph.D

Type:Thesis

University:Indiana University

Candidate:Gunarathne, Thilina

Full Text:PDF

GTID:2478390017498931

Subject:Computer Science

Abstract/Summary:

Over the last decade, three major disruptive trends driven by the software industry altered the scalable parallel computing landscape. These disruptions are the data deluge (i.e., shift to data-intensive from compute-intensive), next generation compute and storage frameworks based on MapReduce, and the utility computing model introduced by cloud computing environments. This thesis focuses on the intersection of these three disruptions and evaluates the feasibility of using cloud computing environments to perform large-scale, data-intensive computations using next-generation programming and execution frameworks. The current key challenges for performing scalable parallel computing in cloud environments include identifying suitable application patterns, identifying efficient and easy-to-use programing abstractions to represent those patterns, performing appropriate task partitioning and task scheduling, identifying suitable data storage and staging architectures, utilizing suitable communication patterns, and identifying appropriate fault tolerance mechanisms.;This thesis will identify three types of application patterns that are well suited for cloud environments. Presented first are pleasingly parallel computations, including pleasingly parallel programming frameworks for cloud environments. Secondly, MapReduce-type applications are explored, including a decentralized architecture and a prototype implementation to develop MapReduce frameworks using cloud infrastructure services. Third and finally, data-intensive iterative applications, which encompass many graph processing algorithms, machine-learning algorithms, and more, are considered. We present the Twister4Azure architecture and runtime as a solution for implementation of data-intensive iterative applications in cloud environments. Twister4Azure architecture extends the familiar, easy-to-use MapReduce programming model with iterative extensions and iterative specific optimizations, enabling a wide array of large-scale iterative and non-iterative data analysis and scientific applications to utilize cloud platforms easily and efficiently in a fault-tolerant manner.;Collective communication operations facilitate the optimized communication and coordination between groups of nodes of distributed computations, which leads to many advantages. We also present the applicability of collective communication operations to the iterative MapReduce computations on cloud and cluster environments, enriching these computations with additional application patterns without sacrificing the desirable properties of the MapReduce model. The addition of collective communication operations enhances the iterative MapReduce model by offering many performance improvements and ease-of-use advantages.

Keywords/Search Tags:

Scalable parallel computing, Iterative, Mapreduce, Cloud, Collective communication operations, Computations, Data, Architecture

Related items

1	Research On Iterative Computations For Big Data In The Cloud
2	The Research And Implementation Of Diversity Demand Oriented Parallel Computing Model
3	Research On Optimization Of Map Reduce For Interactive Analysis On Big Data
4	Study On Parallel Alogrithm Of Large-scale Numerical Calculation In Cloud Computing Environment
5	Reseach On Mapreduce Parallel Computing Platform For Cloud Computing
6	Research On MapReduce Parallel Programming Model In The Cloud Computing
7	Researches And Application Of Mapreduce Parallel Programming Model For Cloud Computing
8	Research On Collective Computing Architecture And Anonymous Incentive Technology
9	Parallel Program Execution Model On Data Communication Optimization
10	General Cloud-native Big Data Architecture With Kubernetes