Data placement in widely distributed systems

Posted on:2006-09-14

Degree:Ph.D

Type:Dissertation

University:The University of Wisconsin - Madison

Candidate:Kosar, Tevfik

Full Text:PDF

GTID:1458390008470217

Subject:Computer Science

Abstract/Summary:

The unbounded increase in the computation and data, requirements of scientific applications has necessitated the use of widely distributed compute and storage resources to meet the demand. In such an environment, data is no more locally accessible and has thus to be remotely retrieved and stored. Efficient and reliable access to data sources and archiving destinations in a widely distributed environment brings new challenges. Placing data on temporary local storage devices offers many advantages, but such "data placements" also require careful management of storage resources and data movement, i.e. allocating storage space, staging-in of input data, staging-out of generated data, and de-allocation of local storage after the data is safely stored at the destination.; Existing systems closely couple data placement and computation, and consider data placement as a side effect of computation. Data placement is either embedded in the computation and causes the computation to delay, or performed as simple scripts which do not have the privileges of a job. In this dissertation, we propose a framework that de-couples computation and data placement, allows asynchronous execution of each, and treats data placement as a full-fledged job that can be queued, scheduled, monitored and check-pointed like computational jobs. We regard data placement as an important part of the end-to-end process, and express this in a workflow language.; As data placement jobs have different semantics and different characteristics than computational jobs, not all traditional techniques applied to computational jobs apply to data placement jobs. We analyze different scheduling strategies for data placement, and introduce a batch scheduler specialized for data placement. This scheduler implements techniques specific to queuing, scheduling, and optimization of data placement jobs, and provides a level of abstraction between the user applications and the underlying data transfer and storage resources.; We provide a complete data placement subsystem for distributed computing systems, similar to I/O subsystem in operating systems. This system offers transparent failure handling, reliable, efficient scheduling of data resources, load balancing on the storage servers, and traffic control on network links. It provides policy support, improves fault-tolerance and enables higher-level optimizations including maximizing the application throughput.

Keywords/Search Tags:

Data, Widely distributed, Computation, Systems

Related items

1	Research On Automatic Data And Computation Decomposition On Distributed-Memory Systems
2	Performance Optimization Of Distributed Graph Computation Framework Based On BSP Model
3	Optimization Service: Parallelization And Distributed Computation
4	Joint Scheduling Of Data And Computation In Geo-distributed Cloud Systems
5	Distributed Gene Sequence Similarity Calculation Based On Secure Multiparty Computation
6	Study On Equalization Method Of High-speed Broadband Mobile Communication System
7	Research On Performance Optimization For Distributed Graph Computation
8	Study And Implementation On Distributed Large Scale Matrix Computation Algorithms With Spark
9	Optimizing Data Repair And Update For Erasure-Coded Systems With XOR-Based In-Network Computation
10	Widely Linear Beamforming Algorithms Based On Noncircularity Coefficient Estimation