Font Size: a A A

Research On Erasure Code Oriented Block Management For Distributed File System

Posted on:2018-06-07Degree:MasterType:Thesis
Country:ChinaCandidate:Z Y GuanFull Text:PDF
GTID:2348330512988936Subject:Computer system architecture
Abstract/Summary:PDF Full Text Request
Nowadays,many large-scale distributed storage systems are tending to use Erasure Coding techniques,which can greatly reduce the storage overhead and provide the same level of reliability as replication.Most of the research focuses on the degraded read problem of erasure codes,however,the underlying support for erasure coding has not been given sufficient attention.That prompt us to design and build a new block management scheme — Ecobm,which focuses on the requirements for both replication as well as erasure coding,and can be easily integrated into an existing replica-based system.Ecobm decides to use offlineencoding strategy and stripped block layout after some careful analysis of real-world workload.To balance the storage overhead and the complexity of block management,it allows allow a cross-file erasure group but tries its best to reduce the number of files that are linked to a group.A state machine is also built to describe the lifecycle of a block and acts as the guide to implementing the block management module.Besides,the placement of blocks in an erasure group is guaranteed by a bipartite graph model,which greatly reduces the IO cost while block relocating.Only long-term immutable data will be stored in erasure coding scheme to reduce the storage overhead.For those hot encoded blocks,the system will temporarily increase its replication factor according to an novel data structure named RRA,which is used to record the realtime access temperature of data,so that the upper applications can enjoy the benefit of data locality.The prototype of Ecobm has been implemented based on HDFS.The experiment results show that it succeeds in reducing the storage overhead even for a scenario with many small files,and meanwhile,providing better data locality for hot datasets.The experiment results indicate that Ecobm can achieve an overall storage overhead of 87.1%even there are a large number of small files in DFS.The dynamic replication mechanism also signicantly reduces the makespan of some MapReduce jobs such as PageRank and TF-IDF.
Keywords/Search Tags:Erasure Coding, Distributed File System, Data Locality, Block Management
PDF Full Text Request
Related items