The Optimization And Implementation Of HDFS High Availability Scheme

Posted on:2019-05-16

Degree:Master

Type:Thesis

Country:China

Candidate:W L Hu

Full Text:PDF

GTID:2428330566499368

Subject:Computer technology

Abstract/Summary:

PDF Full Text Request

With the vigorous development of the Internet,more and more data are generated in the background server.How to scientifically store these massive data has become one of the challenges in the current industry.In recent years,with the iteration and development of big data technologies,the Hadoop Distributed File System(HDFS)has been widly confirmed and applied.However,the current version of HDFS,which adopts Master-Slave architecture and multiple-copy mechanism,just meets basic functional needs and still has the further optimizing space in the aspect of Single Point of Failure(SPOF)and storage utilization.In view of the above two problems,the main works are as follows:(1)A HDFS storage strategy based on locally repairable code is proposed.Through the research and analysis of the current version of HDFS,filesystem using the way to create multiple copies to avoid data loss.It's not hard to see that the replica strategy needs to consume a large number of storage devices in this era.Therefore,this paper proposes a HDFS data storage strategy based on locally repairable erasure code.This algorithm can significantly reduce the storage overhead of the disk compared with the replica strategy and the process of reconstructing the invalid data does not like RS code,which needs to pull all the remaining data from each network node.Compared with EVENODD code and X code,the improved algorithm is more flexible in the number of data nodes.(2)A flat and high availability NameNode model is proposed.The HDFS adopts the Master-Slave architecture,which is mainly composed of one NameNode and multiple DataNodes.NameNode is responsible for managing the whole set of metadata of the cluster and handling all requests from clients.DataNodes are responsible for storing data.Whether the NameNode is working properly is related to the high availability of the entire distributed file system.This article analyzes several existing proposals to guarantee the high availability of NameNode and summaries their advandages and disadvandages.Then,a flat and high availability NameNode model is proposed.This proposal Not only shortens the recovery time of HDFS when the NameNode collapses,but also achieves better load balancing when many clients request access.(3)The implementation and analysis of the system.The improved version of the Hadoop installation file is deployed in the cluster by using the virtual software to simulate the real cluster environment as an experimental platform.Experiments test the efficiency of encoding and reconstructing original data based on the proposed locally repairable erasure code and classic RS code firstly.After that,a series of experiments verify the high availability of the flat NameNode model.The experimental results are shown in the form of charts and screenshots.

Keywords/Search Tags:

big data, HDFS, erasure code, high availability

PDF Full Text Request

Related items

1	Research On Erasure Code In Storage System Based On DHT
2	Research And Design Of HDFS High Availability Based On Paxos
3	Research On High Utilization Rate And Strong Scalability Of HDFS Storage
4	Research On Optimization Of HDFS Erasure Coding
5	The Design And Implementation Of High Availability HDFS Management
6	Research On Cloud Storage Strategy Based On Erasure Code
7	The Design Of Data Coding And Fault Tolerant Recovery Tool Based On RS Erasure Code
8	Based On Erasure Codes Distributed Storage System Design And Implementation
9	Research On Risk-and-popularity-aware Recovery Schemes For Erasure-coded Clustered In-memory Stores
10	Research And Implementation Of HDFS High Availability Based On Cluster