Font Size: a A A

Data placement in widely distributed systems

Posted on:2006-09-14Degree:Ph.DType:Dissertation
University:The University of Wisconsin - MadisonCandidate:Kosar, TevfikFull Text:PDF
GTID:1458390008470217Subject:Computer Science
Abstract/Summary:
The unbounded increase in the computation and data, requirements of scientific applications has necessitated the use of widely distributed compute and storage resources to meet the demand. In such an environment, data is no more locally accessible and has thus to be remotely retrieved and stored. Efficient and reliable access to data sources and archiving destinations in a widely distributed environment brings new challenges. Placing data on temporary local storage devices offers many advantages, but such "data placements" also require careful management of storage resources and data movement, i.e. allocating storage space, staging-in of input data, staging-out of generated data, and de-allocation of local storage after the data is safely stored at the destination.; Existing systems closely couple data placement and computation, and consider data placement as a side effect of computation. Data placement is either embedded in the computation and causes the computation to delay, or performed as simple scripts which do not have the privileges of a job. In this dissertation, we propose a framework that de-couples computation and data placement, allows asynchronous execution of each, and treats data placement as a full-fledged job that can be queued, scheduled, monitored and check-pointed like computational jobs. We regard data placement as an important part of the end-to-end process, and express this in a workflow language.; As data placement jobs have different semantics and different characteristics than computational jobs, not all traditional techniques applied to computational jobs apply to data placement jobs. We analyze different scheduling strategies for data placement, and introduce a batch scheduler specialized for data placement. This scheduler implements techniques specific to queuing, scheduling, and optimization of data placement jobs, and provides a level of abstraction between the user applications and the underlying data transfer and storage resources.; We provide a complete data placement subsystem for distributed computing systems, similar to I/O subsystem in operating systems. This system offers transparent failure handling, reliable, efficient scheduling of data resources, load balancing on the storage servers, and traffic control on network links. It provides policy support, improves fault-tolerance and enables higher-level optimizations including maximizing the application throughput.
Keywords/Search Tags:Data, Widely distributed, Computation, Systems
Related items