A Study Of Redundancy To Optimize The Performance Of Distributed Storage System

Posted on:2015-08-26

Degree:Master

Type:Thesis

Country:China

Candidate:J Wu

Full Text:PDF

GTID:2308330464455513

Subject:Computer application technology

Abstract/Summary:

PDF Full Text Request

With the explosive growth of data, we now have entered the era of Big Data. As the traditional relational database loses the ability to deal with the ex-. tremely large data, a range of new technologies have sprung up, including cloud computing and cloud storage. Cloud computing is computing that involves a large number of computers connected through a communication network, such as Google’s MapReduce and its open-source implementation Hadoop have e-merged as a prevalent paradigm for processing large data sets in data centers. Cloud storage is a model of networked enterprise storage where data is stored in virtualized pools of storage. GFS and HDFS are the major technical of cloud storage.In order to meet the needs of large-scale storage application and provide reliable mass data storage services, the distributed storage system needs to use the redundancy to maintain the availability and reliability of the data. Because of the large amount of data, using the redundancy mechanism demands the distribute file storage system should have huge storage space to store additional data without any other better way to deal with this problem. Fortunately, we can take advantage of the redundancy for optimizing the performance of distributed storage system, such as reducing the energy consumption and minimizing the time of file repair.The primary of redundancy mechanisms are replication and coding. In this paper, we first use replication to reduce the energy consumption of system. This work includes two novel components:an Energy-efficient File Replication poli-cy (EFR), and an Energy-efficient Job Scheduling algorithm (EJS), where EFR places file block replicas in a particular way to facilitate our energy-efficient job scheduling designed by EJS. Then, we introduce a flexible regeneration scheme FTR, which allows providers to generate different amount of coded data in a dis-tributed and heterogeneous network environment. In this work, we make a brief description of the regeneration scheme and focus on developing the Regenerating Code Distributed File System(RCDFS) to realize the scheme.The results of our experiment demonstrate that EJS combined with EFR can save up to 50-60% of server energy for a cluster, while having only negligible delay increases. On the other hand, FTR can effectively run on RCDFS, and the file repair time of FTR is lower than STAR when network heterogeneity is outstanding.

Keywords/Search Tags:

Big Data, Distribute Storage System, Hadoop, Replication, Energy Saving, Network Coding and Repair Time

PDF Full Text Request

Related items

1	The Failure Repair Approach For Distributed Storage System Based On Network Coding
2	Research On Energy Saving Of Hadoop Cluster Based On Neural Network Lstm
3	Data Download And Repair Strategy For D2D Distributed Storage Nteworks
4	The Describing Of Sensing Device Platform Based On Hadoop Distributed Data Storage
5	Research On Node Repair Technology Of Distribute Storage System Under The Background Of Multiple Data Centers
6	Research On The Storage And Repair Of Cloud Data Based On MSBR Coding
7	Research On Energy-Saving Strategy Based On Coverage Method For Distributed Storage System
8	Research On Methods Of Performance Optimization And Energy Saving In Big Data Processing System
9	The Research And Implementation Of Data Backup And Replication In Storage Area Network Management System
10	Network Coding Based Two-Layer Distributed Storage And Repair Method Of Failure Data