Font Size: a A A

Dynamic load balancing for parallel and distributed systems

Posted on:2003-12-02Degree:Ph.DType:Dissertation
University:Northwestern UniversityCandidate:Lan, ZhilingFull Text:PDF
GTID:1468390011484261Subject:Computer Science
Abstract/Summary:
There are many scientific applications for which the computational load varies throughout the execution and causes uneven distribution of workload during run-time. One such class of applications is Adaptive Mesh Refinement (AMR) applications. AMR is a type of multiscale algorithm that achieves high resolution in localized regions of dynamic, multidimensional numerical simulations. A typical AMR application may require enormous computing resources, which usually cannot be satisfied by a single-processor machine, thereby requiring parallel and distributed systems. One of the key issues related to AMR is dynamic load balancing (DLB), which allows large-scale adaptive applications to run efficiently on parallel and distributed systems. In investigating DLB schemes, we first complete a detailed analysis of structured AMR (SAMR) applications, identifying the unique characteristics that impose severe challenges on DLB schemes. The results indicate that most of the available DLB schemes are not appropriate for SAMR applications due to their unique adaptive characteristics. Thus, we propose a novel dynamic load balancing scheme for SAMR applications on parallel systems (denoted as parallel DLB). It integrates a grid-splitting technique with direct grid movements, for which the objective is to reduce the parallel execution time. Further, our experiment shows that simply moving a DLB scheme designed for parallel systems to distributed systems will introduce significant overhead. Therefore, we propose a framework for dynamic load balancing on distributed systems (denoted as distributed DLB). It takes into consideration: (1) heterogeneity of processors, (2) heterogeneity of networks, (3) shared nature of networks, and (4) adaptive characteristics of the applications. For SAMR applications, the distributed DLB incorporates the proposed parallel DLB during the load balancing process. Both parallel DLB and distributed DLB were implemented in the ENZO code, a parallel implementation of SAMR in astrophysics and cosmology. Experiments show that the proposed DLB schemes can significantly improve the performance of SAMR applications on both parallel and distributed systems in terms of the total execution time and the quality of load balancing.
Keywords/Search Tags:Load, Distributed systems, Parallel, Applications, DLB, Execution
Related items