Research On Cloud Storage Based On Hadoop Distributed File System

Posted on:2012-10-30

Degree:Master

Type:Thesis

Country:China

Candidate:W Q Xu

Full Text:PDF

GTID:2178330338984128

Subject:Computer system architecture

Abstract/Summary:

PDF Full Text Request

Cloud computing is a technology derived from grid computing, parallel computing and distributed computing. Using the technology of internet storage, virtualization and load balance, cloud computing can move clients'assignments to data center in Internet to take the advantage of the huge computing ability. Cloud storage derives from cloud computing. It integrates various storage equipments in Internet and makes them collaborate as a whole to offer services. Among numerous open source cloud computing platform, Hadoop Platform has been paid close attention, which is designed by Apache Software Foundation from the GFS idea. HDFS (Hadoop distributed file system) forces on mass data management for cloud storage. Since HDFS has many advantages such as strong expansibility, high reliability and low cost, it has become a hot point in cloud storage research. A cluster can be easily constructed in a laboratory even by a small team and data can be easily managed by the mature file manager.Cloud storage forces on how to ensure the data availability, integrity, serviceability and duality efficiently. Two technologies have been widely used: pure replication and erasure code method. Pure replication can provide services with low latency but occupy more disk space and transmission bandwidth using the data center in different regions. While erasure code method can enhance the duality and use less disk space but consume more resources and long latency. How to combine the advantages of the two methods? The research purpose is to build a new architecture with both advantages and reduce the disadvantages.The paper designs and implements a novel distributed architecture called REPERA combining pure replication and erasure code method based on HDFS. As a type of cloud storage, REPERA takes the advantages of HDFS and both storage method. Besides extensibility, reliability and mass data management, it balances the latency and duality and saves more disk space. On the other hand, client can set some key values to configure the system on the basis of self condition and different application environment. The paper will analyze HDFS architecture explicitly at first, then design and implement REPERA architecture. At last, the experimental data will show the feasibility and efficiency of REPERA.

Keywords/Search Tags:

cloud storage, pure replication, erasure code, HDFS, REPERA

PDF Full Text Request

Related items

1	Research On Data Replication Technology Based On HDFS Storage System
2	Research On Cloud Storage Strategy Based On Erasure Code
3	Design And Implementation Of File Multi-Cloud Secure Storage System Based On Web And Erasure Code
4	Modelling And Analysis Of Cloud Platform Based On Erasure Code Storage Mechanism
5	Research On High Utilization Rate And Strong Scalability Of HDFS Storage
6	Design And Implementation Of The High-performance Cloud Storage System Based On Erasure Code
7	The Research And Implementation Of Replication Management In HDFS
8	Research And Implementation Of Erasure Code Based On Non-uniform Protection Strategy In Cloud Storage
9	Study Of Data Replication Technology In Cloud Storage Environment
10	Research Of Cloud Storage System Optimization Based On HDFs