A Research Of The Proactive Fault Tolerance Scheme For Distributed Storage Systems

Posted on:2017-03-21

Degree:Master

Type:Thesis

Country:China

Candidate:X P Ji

Full Text:PDF

GTID:2348330503992391

Subject:Computer Science and Technology

Abstract/Summary:

PDF Full Text Request

With the arrival of the era of big data, the global data size shows an explosive growth. At the same time, the scale of storage systems also continues to expand. However, it will certainly cause the problem of the high failure rate. How to improve the reliability of cloud storage systems becomes an urgent problem which needs to be solved. At present, cloud storage systems commonly use the passive fault tolerance mechanism, which is a “failure occurrence-data reconstruction” mechanism. Due to its own defects, the passive fault tolerance mechanism seems hard to solve the reliability problem radically. So some researchers raise the proactive fault tolerance mechanism, which is a “failure prediction-pre-waring handling” mechanism. Hard drive failure prediction models can achieve a relatively high accuracy and low false alarm rate, and predict soon-to-fail drives in advance. But few scholars apply the prediction models to distributed systems to improve their reliability.This paper proposes a proactive fault tolerance mechanism, which is called selfscheduling migration mechanism(Self-Scheduling Migraion,SSM). Firstly, it can monitor hard drives’ health status and collect their SMART(Self-Monitoring, Analysis and Reporting Technology) data for training the prediction model. Secondly, it can predict soon-to-fail drives using the prediction model. Finally, it can migrate data from the soon-to-fail drives to others in advance using the results producted by the prediction models. We adopt a distributed pre-warning handling algorithm into distributed systems to transfer the data from soon-to-fail drives. The algorithm can dynamically adjust the migration rates according to drives’ severity levels, which is generated from the realtime prediction results. Moreover, it can make full use of resources and balance load when selecting migration source and destination drives. On the premise of minimizing the side effects of migration to systems’ read and write services, the migration bandwidth is reasonly allocated according to discriminating severity levels.This paper implements a prototype based on sheepdog distributed system. The system only sees respectively 8% and 13% performance drops on read and write operations caused by migration. Compared with reactive fault tolerance, SSM significantly improves systems reliability and availability.

Keywords/Search Tags:

proactive fault tolerance, distributed storage system, priority scheduling, data migraion

PDF Full Text Request

Related items

1	The Research And Implementation Of Distributed Storage System Fault-tolerance Mechanism
2	Research On Active Fault Tolerance Scheme For Distributed Storage System
3	ProActive-based Parallel Program Fault-tolerant Task-scheduling
4	Research And Realization On High Fault-Tolerance Distributed Shared Storage Mechanism And System Implementation
5	Design And Implementation Of Distributed Video Storage Fault-tolerant System
6	Research On The Real-Time Fault-Tolerant Scheduling Algorithms For Distributed Systems
7	The Design And Implementation Of Fault Tolerance System In The Distributed Storage System
8	Optimization Techniques Of Proactive Fault Tolerance For Large-scale High Performance Computing Systems
9	Research On Erasure Codes Based Data Fault Tolerance And Repair For Mobile Distributed Storage Clusters
10	Research Of A Distributed Encryption Storage Scheme With Fault Tolerance And Its Application