Towards improving performance and reliability of Cloud platforms

Posted on:2014-12-27

Degree:Ph.D

Type:Thesis

University:The Pennsylvania State University

Candidate:Sharma, Bikash

Full Text:PDF

GTID:2458390005498754

Subject:Engineering

Abstract/Summary:

PDF Full Text Request

Cloud computing refers to both applications delivered as services over the Internet, as well as the hardware and system software in the data centers, that provides these services. It has emerged as one of the most versatile forms of utility computing, where both applications and infrastructure facilities can be leased from an infinite pool of computing resources in the form of fine-grained pay-as-you-go mode of billing. Over recent years, there has been an unprecedented growth of cloud services from various key cloud providers like Amazon, Google, IBM and Microsoft. Some of the unique characteristics that make cloud computing so attractive and different from traditional distributed systems like data centers and grids include self-organization, elasticity, multi-tenancy, infinite resource pool availability and flexible economy of scale.;With all the promises and benefits offered by cloud computing, also comes the associated challenges. Amidst various challenges, performance and reliability related issues in clouds are very critical and form the main focus of this dissertation. Specifically, the focus of this dissertation is to address the following four specific performance and reliability concerns in clouds: (i) lack of representative cloud workloads and performance benchmarking that are essential to evaluate and assess the various characteristics of cloud systems; (ii) poor management of resources in big data cloud clusters running representative large scale data processing applications like Hadoop MapReduce. Specifically, this is due to the problems associated with the static, fixed-size, coarse-grained, and uncoordinated resource allocation in Hadoop framework; (iii) inefficient scheduling and resource management of representative workload mix of interactive and batch applications, running on hybrid data centers which consist of both native and virtual machines; and (iv) lack of end-to-end effective problem determination and diagnosis framework for virtualized cloud platforms that is quintessential to enhance the reliability of the cloud infrastructure and hosted services.;Towards this pursuit, this dissertation addresses the above mentioned performance and reliability specific problems in clouds, explores the underlying motivations, proposes effective methodologies and solutions, conducts exhaustive evaluations through comprehensive experimental and empirical analyses, and lays foundations for future research directions. The first chapter of the dissertation focuses on the characterization and modeling of cloud workloads. In particular, the thrust is on the modeling and synthesis of an important workload property called task placement constraints, and demonstrates their significant performance impact on scheduling in terms of the incurred task pending delays. The second chapter describes an efficient dynamic resource management framework, called MROrchestrator, which alleviates the downsides of slot-based resource allocation in Hadoop MapReduce clusters. MROrchestrator monitors and analyzes the execution-time resource footprints of constituent map and reduce tasks, and constructs run-time performance models of tasks as a function of their resource allocations, thereby improving the performance of applications and boosting the cluster resource utilization. The third chapter proposes HybridMR, a hierarchical MapReduce scheduler for hybrid data centers. HybridMR operates in a 2-phase hierarchical fashion, where the first phase guides placement of MapReduce jobs on native or virtual machines, based on the expected virtualization overheads. The second phase manages the run-time resource allocations of interactive applications and collocated batch MapReduce jobs, with an objective to provide the best effort delivery to the MapReduce jobs, while complying with the Service Level Agreements (SLAs) of the interactive services. Finally, the fourth chapter addresses the reliability of clouds in the context of efficient problem determination and diagnosis in virtualized cloud platforms, through a novel fault management framework, called CloudPD. A 3-level hierarchical architecture is adopted by CloudPD, consisting of a light-weight event generation phase, comprehensive problem determination phase, and an expert knowledge driven problem diagnosis phase, to provide accurate and fast localization of various anomalies in clouds.;Overall, the dissertation upholds the fact that performance and reliability issues in cloud computing environment are very critical aspects, and need to be well tackled through effective novel research and methodological evaluation.

Keywords/Search Tags:

Cloud, Performance, Applications, Services, Resource, Data centers

PDF Full Text Request

Related items

1	Energy efficient resource allocation for virtual network services with dynamic workload in cloud data centers
2	Performance Analysis Of Cloud Computing Centers Engaged In Big Data Applications
3	Research Of Resource Management On Cloud Data Centers
4	Resource Allocation And Scheduling In Cloud Data Center Networks
5	Energy efficient Data Centers for on-demand cloud services
6	Research Of Energy-efficient Scheduling And Resource Management On Cloud Data Centers
7	Performance Modeling of Cloud Computing Centers
8	Performance Interference And Resource Allocation Optimization In Co-located Data Centers
9	Research On Virtual Resource Allocation Strategy In Cloud Data Centers
10	Research On The Efficient Resource Scheduling In Cloud Data Centers