Font Size: a A A

Enabling Efficient and Dependable Clustered File Systems through New Erasure Coding Techniques

Posted on:2016-08-18Degree:Ph.DType:Thesis
University:The Chinese University of Hong Kong (Hong Kong)Candidate:Li, RunhuiFull Text:PDF
GTID:2478390017477084Subject:Computer Science
Abstract/Summary:
Clustered file systems (CFSes) are broadly adopted by enterprises to provide high storage capacity and data availability. CFSes are composed of nodes (i.e., hosts) interconnected via network. Components in CFSes are prone to failures, and thus is crucial for CFSes to guarantee data availability. CFSes store data with redundancy to tolerate failures and guarantee data availability. Two commonly deployed methods to store redundant data are replication, which stores multiple replicas of a data block, and erasure coding, which encodes a number of data blocks into a set of encoded blocks, such that a subset of a sufficient number of encoded blocks can reconstruct the original data blocks. Although being simple and straightforward to implement, replication consumes a huge amount of storage capacity. As the storage overhead has become a major bottleneck in recent years, CFSes have increasingly deployed erasure coding as an alternative to reducing storage overhead. However, a major drawback of erasure coding is the high network bandwidth consumption during failure recovery. This drawback poses challenges in maintaining both the failure recovery performance and the performance of upper-layer applications in the presence of component failures. Thus, a better integration with erasure coding is significant for today's CFSes. In this thesis, we aim to improve both performance and dependability of erasure-coded CFSes. We study three problems from different perspectives and propose three systems to boost performance and guarantee reliability of erasure-coded CFSes.;In the first part of the thesis, we study the multi-node failure recovery problem of regenerating codes. Data availability is critical in distributed storage systems, especially when node failures are prevalent in real life. A key requirement is to minimize the amount of data transferred among nodes when recovering the lost or unavailable data of failed nodes. We explore recovery solutions based on regenerating codes, which have been designed to provide fault-tolerant storage and minimum bandwidth using the concept of network coding. Existing optimal regenerating codes are designed for single node failures. We build a system called CORE, which augments existing optimal regenerating codes for the recovery of a general number of failures including single and concurrent failures. We show theoretically that CORE achieves the minimum possible bandwidth for most cases. We implement a CORE prototype and evaluate it atop an HDFS cluster testbed with up to 20 storage nodes. We demonstrate that our CORE prototype conforms to our theoretical findings and achieves bandwidth savings when compared to the conventional recovery approach based on erasure codes.;In the second part of the thesis, we study performance of upper-layer applications running on top of erasure-coded CFSes. We have witnessed an increasing adoption of erasure coding in modern clustered storage systems to reduce the storage overhead of traditional 3-way replication. However, it remains an open issue of how to customize the data analytic paradigm for erasure-coded storage, especially when the storage system operates in failure mode. We propose degraded-first scheduling, a new MapReduce scheduling scheme that improves MapReduce performance in erasure-coded clustered storage systems in failure mode. Its main idea is to launch degraded tasks earlier so as to leverage the unused network resources. We conduct mathematical analysis and discrete event simulation to show the performance gain of degraded-first scheduling over Hadoop's default locality-first scheduling. We further implement degraded-first scheduling on Hadoop and conduct testbed experiments in a 13- node cluster. We show that degraded-first scheduling reduces the MapReduce runtime of locality-first scheduling.;In the last part of the thesis, we study the efficient and reliable transitions of data from replication to erasure coding. To balance performance and storage efficiency, modern clustered file systems (CFSes) often first store data with random replication (i.e., distributing replicas across randomly selected nodes), followed by encoding the replicated data with erasure coding. We argue that random replication, while being commonly used, does not take into account erasure coding and hence will raise both performance and availability issues to the subsequent encoding operation. We propose encoding-aware replication, which carefully places the replicas so as to (i) avoid cross-rack downloads of data blocks during encoding, (ii) preserve availability without data relocation after encoding, and (iii) maintain load balancing as in random replication. We implement encoding-aware replication on HDFS, and show via testbed experiments that it achieves significant encoding throughput gains over random replication. We also show via discrete-event simulations that encoding-aware replication remains effective under various parameter choices in a large-scale setting. We further show that encoding-aware replication evenly distributes replicas as in random replication.
Keywords/Search Tags:Coding, File systems, Data, Storage, Replication, Clustered, Cfses, Show
Related items