Failure-Aware Reconfigurable Distributed Virtual Machine for dependable and high productivity computing

Posted on:2009-07-21

Degree:Ph.D

Type:Dissertation

University:Wayne State University

Candidate:Fu, Song

Full Text:PDF

GTID:1448390005461236

Subject:Computer Science

Abstract/Summary:

PDF Full Text Request

Modern networked computing systems continue to grow in scale and in the complexity of their components and interactions. Component failures become norms instead of exceptions in these environments. Therefore, it is important to ensure the availability and adaptivity of computing services. To this end, we present Failure-Aware Reconfigurable Distributed Virtual Machine ( FAR-DVM) framework to build failure-resilient and dependable high-productivity computing systems.;The framework monitors and analyzes node, cluster and system wide failure behaviors and forecasts prospective failure occurrences based on quantified failure dynamics. The prediction results are utilized to manage system resources in failure-aware manner. The system management components autonomically construct resilient and dependable services and integrate geographically distributed resources into a seamless environment.;Within FAR-DVM framework, we propose hPREFECTS for proactive failure management. It collects failure events from compute nodes at runtime and constructs a failure signature for each event. It then analyzes the temporal and spatial correlations among failure signatures in different system scopes. The quantified correlation data is used by a failure predictor in forecasting the occurrence time of failures in the near future.;To manage system resources in a failure-aware manner, we also propose a construction and reconfiguration strategy for distributed virtual machines (DVM). It leverages the failure prediction results in resource management. We consider both the performance and reliability status of compute nodes, and define a capacity-reliability metric to combine the effects of both factors in node selection. We propose Best-fit algorithms with optimistic and pessimistic selection strategies to find the best qualified nodes on which to construct and reconfigure DVMs.;We have designed and implemented a prototype of FAR-DVM and evaluated it in production environments. The hPREFECTS achieves more than 76% accuracy in offline prediction of failures by using the Los Alamos HPC traces. For online predictions, its accuracy is more than 70% in the Wayne State Computational Grid. We enhance the system productivity by using our proposed failure-aware resource management strategy with practically achievable accuracy of failure prediction. With the Best-fit strategies, the job completion rate is increased by 17.6% compared with that achieved in the current LANL HPC cluster. The task completion rate reaches 91.7% with 83.6% utilization of relatively unreliable nodes.;Complement to the work on failure-aware resource management, we have also proposed a service migration mechanism which moves runtime computing services from one compute node to another, in face of system anomalies. To evaluate the goodness of migration polices, we have investigated the migration decision problem for load balancing. We derive the optimal time for service migration with the objective of minimizing migration frequency, and obtain the lower bound of the destination server capacity.

Keywords/Search Tags:

Failure, Computing, Distributed virtual, System, Migration, Dependable

PDF Full Text Request

Related items

1	Research On Dependable Computing Oriented Distributed Fault Detection System
2	Research On Dependable Job Scheduling In Grid
3	The Research And Implement Of Distributed JVM Based On Thread Migration
4	Autonomic failure identification and diagnosis for building dependable cloud computing systems
5	Research On Virtual Machine Migration Mechanism In Cloud Computing
6	Research Of Virtual Resource Consolidation Algorithm Based On Virtual Cluster Migration
7	The Dynamic Migration Of Virtual Machines In The Cloud Computing Research
8	Research On Dynamic Migration Strategy Of Virtual Machines In Mobile Edge Computing Environment
9	Research On Resource Scheduling Strategy Of Cloud Computing Platform In Electric Power System
10	Research On Virtual Machine Migration Based On Multi-Objective Optimization In Cloud Computing Environment