Font Size: a A A

A Research Of The Proactive Fault Tolerance Scheme For Distributed Storage Systems

Posted on:2017-03-21Degree:MasterType:Thesis
Country:ChinaCandidate:X P JiFull Text:PDF
GTID:2348330503992391Subject:Computer Science and Technology
Abstract/Summary:PDF Full Text Request
With the arrival of the era of big data, the global data size shows an explosive growth. At the same time, the scale of storage systems also continues to expand. However, it will certainly cause the problem of the high failure rate. How to improve the reliability of cloud storage systems becomes an urgent problem which needs to be solved. At present, cloud storage systems commonly use the passive fault tolerance mechanism, which is a “failure occurrence-data reconstruction” mechanism. Due to its own defects, the passive fault tolerance mechanism seems hard to solve the reliability problem radically. So some researchers raise the proactive fault tolerance mechanism, which is a “failure prediction-pre-waring handling” mechanism. Hard drive failure prediction models can achieve a relatively high accuracy and low false alarm rate, and predict soon-to-fail drives in advance. But few scholars apply the prediction models to distributed systems to improve their reliability.This paper proposes a proactive fault tolerance mechanism, which is called selfscheduling migration mechanism(Self-Scheduling Migraion,SSM). Firstly, it can monitor hard drives' health status and collect their SMART(Self-Monitoring, Analysis and Reporting Technology) data for training the prediction model. Secondly, it can predict soon-to-fail drives using the prediction model. Finally, it can migrate data from the soon-to-fail drives to others in advance using the results producted by the prediction models. We adopt a distributed pre-warning handling algorithm into distributed systems to transfer the data from soon-to-fail drives. The algorithm can dynamically adjust the migration rates according to drives' severity levels, which is generated from the realtime prediction results. Moreover, it can make full use of resources and balance load when selecting migration source and destination drives. On the premise of minimizing the side effects of migration to systems' read and write services, the migration bandwidth is reasonly allocated according to discriminating severity levels.This paper implements a prototype based on sheepdog distributed system. The system only sees respectively 8% and 13% performance drops on read and write operations caused by migration. Compared with reactive fault tolerance, SSM significantly improves systems reliability and availability.
Keywords/Search Tags:proactive fault tolerance, distributed storage system, priority scheduling, data migraion
PDF Full Text Request
Related items