Font Size: a A A

The Optimization Of Parity Update And Failure Recovery For Fault-tolerant Storage System

Posted on:2018-10-10Degree:MasterType:Thesis
Country:ChinaCandidate:D D SunFull Text:PDF
GTID:2348330515496439Subject:Computer software and theory
Abstract/Summary:PDF Full Text Request
As the exponential growth of data,the scale of storage system is becoming larger and larger.Traditional single-node storage system is not able to cope with the demand of current data store.Distributed storage system,which is built from a large amount of inexpensive commodity hardware,can provide high performance and excellent scal-ability.However,with the increase of cluster size,more storage nodes are added to the cluster,node failure becomes common.How to keep the reliability of data that stored in large scale storage system becomes an important issue.There are two ways to gain fault-tolerant ability for storage system,i.e.,replication and erasure coding.Since erasure codes incur much less storage overhead for the same fault-tolerance,they are widely deployed in modern storage system.Erasure coding technique divides the data into chunks with the same size,and encodes the data chunks into some parity chunks,if failed chunks are less than the number of parity chunks,then the failed data can be decoded by the survived chunks.To insure the reliability of storage system,the parity should be consistent with the related data chunks.That is to say,when the data chunks are modified,the parity chunks should also be updated accordingly.Furthermore,once the parity nodes are failed,the recovery process should be finished as soon as possible.This paper mainly focus on efficient parity update and efficient parity recovery.The main content and contribution are described as follows:(1)Efficient parity update for RAID-like storage systemIt is inevitable to scale RAID-like storage systems with the increasing demand of storage capacity and I/O throughput.Once adding new disks to the current system,some data need to be migrated from the old disks to the newly added disks to keep load balance and make fully use of the throughput provided by the new disks.The parity update problem is more seriously during the scaling process.We proposed an efficient parity update algorithm(abbr.EPU).EPU adjusts the scaling sequence based on the zone accessed by user request,so as to make fully use of the user I/O to reduce scaling I/O.And EPU can chooses the best parity update scheme by comparing the overhead induced by different parity update scheme.Additionally,EPU uses the access aggregation technique to reduce the system load.EPU can reduce the overhead induced by parity update and accelerate the accomplishment of scaling process.(2)Speed-up the recovery of parity nodeThe scale of distributed storage system is so large,which can aggregate thousands of storage servers.Once a node is failed,it must be repaired to maintain data availability.Further more,the recovery process should be finished as soon as possible.If the repair process lasts too long,more nodes may be failed.Once the amount of failed nodes exceeds the system tolerance ability,it will cause permanent data loss.Thus,efficient disk recovery is significant to keep the system reliability.In this paper,we proposed a new erasure code,which is friendly to the recovery of single node failure.We name the efficient single-failure recovery code ESRC.ESRC can reduce the overhead of single node recovery.Even compared with LRC,which has the most efficient single node reovery efficiency,ESRC can still reduce the overhead induced by single parity node recovery.Moreover,ESRC can maintain low storage overhead.
Keywords/Search Tags:storage system, erasure code, fault-tolerance, parity update, parity recov-ery
PDF Full Text Request
Related items