Font Size: a A A

Research On RAID-6Codes And Fast Failure Recovery In Storage System

Posted on:2016-05-12Degree:DoctorType:Dissertation
Country:ChinaCandidate:P XieFull Text:PDF
GTID:1108330467998328Subject:Computer system architecture
Abstract/Summary:PDF Full Text Request
Out of consideration for the significance of user data, the storage reliability is also a key research direction in storage community. Replication and array code are two main fault-tolerance methods for storage systems at present. Compared with replication, array codes have better storage efficiency. RAID-6codes, fault-tolerance for any double disks, have become hot spots of research in academic and industrial community. Recovery performance and I/O balancing of RAID-6codes are very important metrics, and direct affect the reliability and availability of RAID-coded storage systems. Therefor, with consideration of the coding and scheduling, this dissertation focus on designing new RAID-6code, optimizing the data layout of RAID-6, improving recovery performance of RAID-6, the main contributions include:This dissertation provides a new class of xor-based RAID-6code, i.e., V2-Code. The coding belongs to Non-MDS, and delivers better recovery performance than popular MDS codes. It is also a class of vertical code, parity blocks are uniform allocated in all disks, have balanced I/Os. V2-Code’s unique features include lowest density code, steady parity chain length, and well-balanced computation and so on.The result of performance evaluation shows that V2-Code outperforms the popular RAID-6codes in terms of load balancing and recovery time, for example, in the single-disk-failure and double-disk-failure cases, V2-Code can speed up the recovery time of X-Code by a factor of up to3.31and1.79, respectively.This dissertation proposes a novel efficient data layout-UPC-to support highly balanced I/Os among P-Coded disk arrays(i.e., PC). In UPC(Uniform P-Code, UPC), the distribution nonuniform information symbols in each parity chain of P-Code are moved along their columns to other rows, thus enabling the parity chain to keep original parity relationships and tolerate double disk failures. The UPC scheme not only achieves optimal storage efficiency, computational complexity and update complexity, but also supports better I/O balancing in the context of large-scale storage systems. The experimental results illustrate that our UPC scheme significantly outperforms the PC scheme in terms of average user response time. In particular, in the case of a12-disk array, the UPC scheme can improve the average user response time of PC scheme by29.9%.This dissertation proposes a heterogeneity-aware single-failure recovery scheme- SmartRec-for double-and multiple-fault-tolerant RAIDs. To taking both static heterogeneity associated with disk configurations and dynamic heterogeneity affected by I/O loads into account, SmartRec periodically selects an appropriate reconstruction solution according to up-to-date disk utilization. To quantitatively assess the SmartRec scheme and three alternatives (i.e., ConRec, MinRec, and BalRec), which adopt recovery strategies of Minimized read reconstruction data and balanced read reconstruction data, we formulate four reconstruction-time models and validate the correctness of the four model using empirical evaluations. We implement the four reconstruction schemes in a heterogeneous RAID, the experimental results illustrate that our SmartRec scheme outperform the three existing, reconstruction schemes in terms of reconstruction time, for example, in the case of9-disk array and on-line, SmartRec scheme improve the recovery time of ConRec scheme by a factor of35.3%.
Keywords/Search Tags:Erasure-Coded Storage System, Storage Reliability, Data Availability, DiskRecovery Technology, I/O Balancing Technology
PDF Full Text Request
Related items