Font Size: a A A

A Study Of Redundancy To Optimize The Performance Of Distributed Storage System

Posted on:2015-08-26Degree:MasterType:Thesis
Country:ChinaCandidate:J WuFull Text:PDF
GTID:2308330464455513Subject:Computer application technology
Abstract/Summary:PDF Full Text Request
With the explosive growth of data, we now have entered the era of Big Data. As the traditional relational database loses the ability to deal with the ex-. tremely large data, a range of new technologies have sprung up, including cloud computing and cloud storage. Cloud computing is computing that involves a large number of computers connected through a communication network, such as Google’s MapReduce and its open-source implementation Hadoop have e-merged as a prevalent paradigm for processing large data sets in data centers. Cloud storage is a model of networked enterprise storage where data is stored in virtualized pools of storage. GFS and HDFS are the major technical of cloud storage.In order to meet the needs of large-scale storage application and provide reliable mass data storage services, the distributed storage system needs to use the redundancy to maintain the availability and reliability of the data. Because of the large amount of data, using the redundancy mechanism demands the distribute file storage system should have huge storage space to store additional data without any other better way to deal with this problem. Fortunately, we can take advantage of the redundancy for optimizing the performance of distributed storage system, such as reducing the energy consumption and minimizing the time of file repair.The primary of redundancy mechanisms are replication and coding. In this paper, we first use replication to reduce the energy consumption of system. This work includes two novel components:an Energy-efficient File Replication poli-cy (EFR), and an Energy-efficient Job Scheduling algorithm (EJS), where EFR places file block replicas in a particular way to facilitate our energy-efficient job scheduling designed by EJS. Then, we introduce a flexible regeneration scheme FTR, which allows providers to generate different amount of coded data in a dis-tributed and heterogeneous network environment. In this work, we make a brief description of the regeneration scheme and focus on developing the Regenerating Code Distributed File System(RCDFS) to realize the scheme.The results of our experiment demonstrate that EJS combined with EFR can save up to 50-60% of server energy for a cluster, while having only negligible delay increases. On the other hand, FTR can effectively run on RCDFS, and the file repair time of FTR is lower than STAR when network heterogeneity is outstanding.
Keywords/Search Tags:Big Data, Distribute Storage System, Hadoop, Replication, Energy Saving, Network Coding and Repair Time
PDF Full Text Request
Related items