Font Size: a A A

Parallel Computing In The Host Fault Tolerant Mechanism Studies

Posted on:2012-03-24Degree:MasterType:Thesis
Country:ChinaCandidate:W ZhangFull Text:PDF
GTID:2218330338469948Subject:Software engineering
Abstract/Summary:PDF Full Text Request
With the high performance of parallel computing systems becoming more popular, the probability of the software and hardware failure is greater. Because the grid itself and the highly dynamic grid resource heterogeneity, the platform has a greater chance of error than traditional computing Grid computing platform. The system fault-tolerance and reliability have become major application scalability constraints. More and more attention has been paid to the Fault-tolerant in high-performance parallel computing technology. How to add the appropriate fault-tolerance mechanism according to the characteristics for the grid system is a very valuable issue in high-performance computing research subsections.With a deep understanding of the Grid platform, this thesis is consist of the following parts:First, P2P-MPI is the experimental platform to validate various error detection methods. We will evaluate each method and explore its application.Second, the error recovery mechanisms, replica consistency, copy number, and other network parameters on the impact of the backup process group are discussed. How to find the number of the best backups will be investigated in this thesis. The host allocation strategy, considering the bandwidth and CPU capacity to influence, though they are only simulated using models, are the key points in this thesis.Third, we search for the best backup process. We put forward a failure probability that we can tolerate. Under the conditions of this probability, the recovery mechanism will immediately start, more bandwidth is saved.
Keywords/Search Tags:fault tolerant, fault detection, fault recovery, host allocation strategy
PDF Full Text Request
Related items