Font Size: a A A

Research On Data Encoding Optimization And Data Deduplication In Cloud Storage

Posted on:2014-07-12Degree:MasterType:Thesis
Country:ChinaCandidate:X K QuFull Text:PDF
GTID:2348330473453785Subject:Computer application technology
Abstract/Summary:PDF Full Text Request
The need for storage space by varieties of applications in cloud storage is urgent, growing from GB to TB even to EB order of magnitude. As the amount of data increases, the required storage space is getting bigger and the corresponding energy consumption is also growing. In the cloud storage environment, studying how to take advantage of erasure coding and data deduplication to effectively improve the utilization of storage space is meaningful.This paper analyzes the existing research on cloud storage and finds that erasure coding can save more storage space and network bandwidth compared to the multi-copy storage technology. In Hadoop, the cloud computing platform, there will be a lot of duplicate data among the data. The data deduplication technology can identify these duplicate data and avoid storing them to improve the utilization of storage space.Cauchy Reed-Solomon coding is a mainstream kind of erasure coding. In order to optimize the efficiency of Cauchy Reed-Solomon (CRS) coding efficiency and add data deduplication to cloud storage system, the research is carried out as follows:(1) A selection framework is presented in this paper to optimize Cauchy Reed-Solomon coding. Any storage system which uses CRS as disaster recovery strategy can take advantage of the optimal schedule scheme produced by the selection framework to improve coding efficiency.(2) The optimal schedule scheme is integrated into Hadoop to achieve Cauchy Reed-Solomon coding optimization in cloud storage environment.(3) Data deduplication is implemented in each datanode of Hadoop to improve the utilization of storage space.Hadoop is the main cloud storage experimental platform for the experiments of this paper. On this basis, Cauchy Reed-Solomon encoding optimization and adding data deduplication to cloud storage is accomplised. The efficiency of Cauchy Reed-Solomon coding and the utilization of storage space is improved.
Keywords/Search Tags:cloud storage, erasure coding, deduplication, Hadoop
PDF Full Text Request
Related items