Font Size: a A A

A Comprehensive Data Redundancy Policy For High Reliable Storage Systems

Posted on:2019-08-06Degree:MasterType:Thesis
Country:ChinaCandidate:Brian Alberto Ignacio ReyesFull Text:PDF
GTID:2428330590467359Subject:Computer Science and Technology
Abstract/Summary:PDF Full Text Request
A tiered storage system allocates its data based on access frequency,putting the most accessed data in fast storage devices(hot data layer),intermediate accessed data in medium storage devices(warm data layer)and the rarely accessed data in the slow storage devices of the system(cold data layer).Usually distributed file systems use replication to provide both high reliability and availability,which store three replicas over different nodes in the clusters.This approach is used to establish the hot data layer.Erasure codes(EC)such as Reed-Solomon(RS)are increasingly utilized to further reduce the storage overhead at the expense of lower I/O performance and data availability.Erasure coding is mostly suited to store cold data(as cold data layer).Existing solutions nowadays implement heterogeneous storage systems either using triple replication,erasure coding methods or a combination of both.In these implementations,the combination of replication and erasure coding solutions has shown that there is a huge gap between them in terms of I/O performance and storage cost,which results in high system overhead to transform data from one form to another.Therefore,existing approaches are not suitable for warm data layer implementation,especially in tiered storage systems.To address this problem,in this paper,we introduce WarmCache,a new data layer for warm data by having one copy stored using erasure coding and the other copy in memory data layer.Using one copy in erasure coding data layer ensures data reliability,while the other copy in memory data layer provides fast I/O performance.To demonstrate the effectiveness of WarmCache,we implement our approach into Alluxio and Hadoop system and measure the I/O performance and storage overhead using TestDFSIO.And the results show that,compared to traditional approaches(e.g.,replication,EC and combination),our solution increases the performance over storage overhead by up to 10.2× over erasure coding method and 3.63× over replication method.
Keywords/Search Tags:Erasure Codes, Storage Overhead, I/O Performance, Replication, Cache
PDF Full Text Request
Related items