Font Size: a A A

Research On Cloud Storage Based On Hadoop Distributed File System

Posted on:2012-10-30Degree:MasterType:Thesis
Country:ChinaCandidate:W Q XuFull Text:PDF
GTID:2178330338984128Subject:Computer system architecture
Abstract/Summary:PDF Full Text Request
Cloud computing is a technology derived from grid computing, parallel computing and distributed computing. Using the technology of internet storage, virtualization and load balance, cloud computing can move clients'assignments to data center in Internet to take the advantage of the huge computing ability. Cloud storage derives from cloud computing. It integrates various storage equipments in Internet and makes them collaborate as a whole to offer services. Among numerous open source cloud computing platform, Hadoop Platform has been paid close attention, which is designed by Apache Software Foundation from the GFS idea. HDFS (Hadoop distributed file system) forces on mass data management for cloud storage. Since HDFS has many advantages such as strong expansibility, high reliability and low cost, it has become a hot point in cloud storage research. A cluster can be easily constructed in a laboratory even by a small team and data can be easily managed by the mature file manager.Cloud storage forces on how to ensure the data availability, integrity, serviceability and duality efficiently. Two technologies have been widely used: pure replication and erasure code method. Pure replication can provide services with low latency but occupy more disk space and transmission bandwidth using the data center in different regions. While erasure code method can enhance the duality and use less disk space but consume more resources and long latency. How to combine the advantages of the two methods? The research purpose is to build a new architecture with both advantages and reduce the disadvantages.The paper designs and implements a novel distributed architecture called REPERA combining pure replication and erasure code method based on HDFS. As a type of cloud storage, REPERA takes the advantages of HDFS and both storage method. Besides extensibility, reliability and mass data management, it balances the latency and duality and saves more disk space. On the other hand, client can set some key values to configure the system on the basis of self condition and different application environment. The paper will analyze HDFS architecture explicitly at first, then design and implement REPERA architecture. At last, the experimental data will show the feasibility and efficiency of REPERA.
Keywords/Search Tags:cloud storage, pure replication, erasure code, HDFS, REPERA
PDF Full Text Request
Related items