Research On Active Fault Tolerance Scheme For Distributed Storage System

Posted on:2024-07-03

Degree:Master

Type:Thesis

Country:China

Candidate:Q H Li

Full Text:PDF

GTID:2568307130958189

Subject:Computer technology

Abstract/Summary:

PDF Full Text Request

With the explosive growth of global data and the increasing scale of distributed storage systems,hard disk failures have become the norm and system data reliability and service availability guarantees are greatly threatened.Compared with a single passive fault-tolerant technology,a distributed storage system combined with active fault-tolerant technology can respond more comfortably and effectively to the problems caused by hard disk failures.The active fault tolerance technology of storage system mainly includes two aspects of hard disk failure prediction and data recovery in advance,and a lot of research has been conducted on it at home and abroad,but most of the hard disk failure prediction models are built for a single model of mechanical hard disk and cannot meet the hard disk heterogeneity problem in distributed storage system,and the research gap in data recovery in advance for distributed storage scenarios requiring low latency and high reliability needs to be filled.Therefore,the current storage system active fault tolerance technology cannot meet the needs of distributed storage system scenarios.In this thesis,we investigate the active fault tolerance scheme for distributed storage systems with the goal of improving system data reliability and ensuring system service availability.First,for hard disk failure prediction,a Multi-type Disk Failure Prediction(MTDFP)method for distributed storage systems is proposed,which can build a corresponding hard disk failure prediction model with better prediction performance for each type of hard disk series in distributed storage systems.The MTDFP is validated in two enterprise real public datasets,and the experimental results show that the method can achieve an average of 78% FDR(Failure Detection Rate),which provides a better basis and guidance for the subsequent data advance recovery strategy in the distributed storage system active fault tolerance scheme.Secondly,a Data Scheduling Optimizer(DSO)for distributed storage system based on spare storage resource pool and early warning priority is proposed for early data recovery,which can migrate the dangerous data on multiple pre-failed hard disks in advance in the order of early warning priority to Each spare drive.The DSO has been applied and experimented on Ceph storage systems,and the experimental results show that the strategy not only greatly reduces the additional data migration and data recovery time of the cluster,but also significantly improves the performance of cluster read and write operations.Finally,based on MTDFP and DSO,a whole set of active fault tolerance scheme for Ceph storage system from acquisition to prediction to scheduling is formed.The reliability quantification results show that the scheme can improve data reliability by 1-3dimensions in Ceph clusters deployed with different policies.

Keywords/Search Tags:

Distributed storage systems, Active fault tolerance technology, Hard drive failure prediction, Early data recovery, Ceph

PDF Full Text Request

Related items

1	The Optimization Of Cross-rack Data Repair Technology For Distributed Storage Systems Based On Ceph
2	Research On Method For Hard Drive Failure Prediction In Massive Storage System
3	Failure Tolerance And Prediction For Storage Systems In Datacenters
4	Research On Node Fault Tolerance Selection And Backup Data Transmission In CEPH Distributed Storage System
5	Research On Disk Fault Warning Processing Method For Active Fault-tolerant Storage System
6	Research On Hard Disk Fault Prediction Technology In Massive Data Storage System
7	Design and evaluation of distributed wide-area on-line archival storage systems
8	Study On Fault-tolerant Mechanisms Of Distributed Storage Systems Based On Network Coding
9	Study On Backward Recovery Of Fault Tolerant Technology In Distributed Systems
10	Research On Failure Prediction And Fault-tolerance Technology For Supercomputer