Optimizing response time for distributed applications in public clouds

Posted on:2016-02-08

Degree:Ph.D

Type:Dissertation

University:Cornell University

Candidate:Zou, Tao

Full Text:PDF

GTID:1478390017470382

Subject:Computer Science

Abstract/Summary:

An increasing number of distributed data-driven applications are moving into public clouds. By sharing resources and operating at large scale, public clouds promise higher utilization and lower costs than private clusters. Also, flexible resource allocation and billing methods offered by public clouds enable tenants to control response time or time-to-solution of their applications.;To achieve high utilization, however, cloud providers inevitably place virtual machine instances non-contiguously, i.e., instances of a given application may end up in physically distant machines in the cloud. This allocation strategy leads to significant heterogeneity in average network latency between instances. Also, virtualization and the shared use of network resources between tenants results in network latency jitter. We observe that network latency heterogeneity and jitter in the cloud can greatly increase the time required for communication in these distributed data-driven applications, which leads to significantly worse response time.;To improve response time under latency jitter, we propose a general parallel framework which exposes a high-level, data-centric programming model. We design a jitter-tolerant runtime that exploits this programming model to absorb latency spikes transparently by (1) carefully scheduling computation and (2) replicating data and computation. To improve response time with heterogeneous mean latency, we present ClouDiA, a general deployment advisor that selects application node deployments minimizing either (1) the largest latency between application nodes, or (2) the longest critical path among all application nodes.;We also describe how to effectively control response time for interactive data analytics in public clouds. We introduce Smart, the first elastic cloud resource manager for in-memory interactive data analytics. Smart enables control of the speed of queries by letting users specify the number of compute units per GB of data processed, and quickly reacts to speed changes by adjusting the amount of resources allocated to the user. We then describe SmartShare, an extension of Smart that can serve multiple data scientists simultaneously to obtain additional cost savings without sacrificing query performance guarantees. Taking advantage of the workload characteristics of interactive data analysis, such as think time and overlap between datasets, we are able to further improve resource utilization and reduce cost.

Keywords/Search Tags:

Public clouds, Time, Applications, Data, Distributed, Resource

Related items

1	Research On Data Placement And Fault-Tolerant Scheduling For Applications Of Data Stream In Geo-distributed Clouds
2	Supporting time-critical event processing in grids and clouds
3	Efficient data scheduling for real-time large-scale data-intensive distributed applications
4	Automatic enablement, coordination and resource usage prediction of unmodified applications on clouds
5	Distributed Approaches For Finite-horizon Resource Allocation And Their Applications
6	A QoS-driven resource allocation framework based on the risk incursion function and its incorporation into a middleware architecture and mechanisms supporting distributed fault-tolerant real-time computing applications
7	Task Feature Based Trading Strategy For Resource Efficiency In Public IaaS Clouds
8	Resource management for data streaming applications
9	Emulation System Of Climate Based On Clouds
10	Shanghai R & D Public Service Platform For Applications And Database Systems Planning And Research