Font Size: a A A

The Design And Implementation Of Proactive Fault Tolerant System Based On Dynamic Migration Of Virtual Machine

Posted on:2015-07-30Degree:MasterType:Thesis
Country:ChinaCandidate:Y MengFull Text:PDF
GTID:2308330464964660Subject:Computer system architecture
Abstract/Summary:PDF Full Text Request
With the rapid development of cloud computing, the demand for computing resources is increasing in all areas of society.And application area of high-performance computing cluster is more and more wide. Including the system of bank, shopping systems, weather systems, scientific computing systems and etc, many systems require a lot of computing resources. In order to meet the increasing computing resource requirements, the scale of computing cluster is also expanding. In such a large computing clusters, the incidence of node failures is rising.A node may fail for many reasons. There are a variety of factors that are mainly in the aspect of hardware, software, environment, network, human, etc. And hardware failures occur most frequently. In order to reduce the impact of hardware failure on the system, this paper makes a research on proactive fault tolerant techniques. The main contents includes the following three points:1.In order to reduce the impact of the cluster hardware failure, we design and implement a proactive fault-tolerant system for large-scale scientific computing cluster. The system consists of three modules, including data collection module, error prediction module and fault-tolerant migration module. Data collection module is responsible for collecting cluster hardware resources data and system resource data, and then sends the data to the error-prediction module. Error-prediction module receives data collected by the data collection module, and use threshold algorithm or threshold-gradient algorithm to analyze the data, the result of error-prediction will be sent to the migration module. Migration module receives information of unhealthy node, and the scheduling results obtained from the scheduler and then and migrate the task on unhealthy host to the healthy host.2. Propose a suitable proactive fault tolerant error prediction algorithm. Taking into account the lack of considering the data trends of existing algorithms, introducing the data change rate to the error prediction, we propose a threshold gradient prediction algorithm. The algorithm is not only refer to the value itself, but also focuses on data trends, so that it increases fault prediction accuracy.3. Propose a method for selecting the prediction algorithm. The method is based on the characteristics of the detected hardware data itself, which decides the risk of the entire computer system by the hardware. By describing the law of a hardware risk factor changing with the data of the hardware, we select the algorithms which is suitable for the laws, and enable the system to predict the hardware data with a more flexible and effective prediction algorithm.
Keywords/Search Tags:Proactive Fault Tolerant, Failure Prediction, Threshold-gradient model, Dynamic migration
PDF Full Text Request
Related items