Font Size: a A A

Logistical scheduling for data movement in computational grids

Posted on:2004-12-20Degree:Ph.DType:Dissertation
University:University of California, Santa BarbaraCandidate:Swany, Douglas Martin, JrFull Text:PDF
GTID:1458390011457280Subject:Computer Science
Abstract/Summary:
This work describes an approach to dynamic: scheduling of data movement for Computational Grids. The growth in performance in networks has been insufficient to support distributed computing on a global scale. Given the vision of a worldwide Grid of computational and storage services, we note that the performance of moving data across a network is critical to individual program performance and to the viability of the Grid environment as a whole.; We observe that by using cooperative, short-term storage in the network, we can improve observed network performance. We refer to this use of storage as “logistics” in analogy to transportation and storage optimization. Due to the nature of Internet protocols, well-placed buffering can increase available bandwidth, particularly in high-performance computing environments.; Our goal is to improve available throughput by applying a “logistical” scheduling approach to the movement of data in Grid environments. We approach these optimizations based on forecasts of available capacity that are produced from the performance history of the resources in question. This performance “map” enables an approach to scheduling based on the cooperative forwarding approach mentioned above.; This work describes the end-to-end performance improvements possible through the use of logistical storage “depots” in the network. We refer to this phenomenon as the “logistical effect” and we explore its causes and quantify the potential speedup. The performance advantage for a logistical session involves the nature of the dominant Internet transport protocol—TCP. By scheduling data streams through depots in order to minimize the distance of any single hop, we are able to improve the observed host-to-host bandwidth.; In order to be truly useful, our logistical system must be able to automatically schedule data movement via some set of depots if necessary. In order to enjoy the performance benefits of data logistics, we must identify situations in which performance can be improved and determine the appropriate data movement schedules to effect this improvement. To hide the complexity of these decisions from the end user, we investigate scheduling algorithms that can optimize data movement.; Finally, we detail a middleware-based system which addresses constraints related to real-world usability. Results from this system, and the appropriateness of our scheduling approach, are empirically evaluated. We find that we can improve end-to-end throughput in a variety of cases and that our scheduling approach successfully identifies these cases.
Keywords/Search Tags:Scheduling, Data movement, Approach, Performance, Computational, Grid, Logistical, Network
Related items