Font Size: a A A

Research And Design Of HDFS High Availability Based On Paxos

Posted on:2013-11-09Degree:MasterType:Thesis
Country:ChinaCandidate:P A YangFull Text:PDF
GTID:2248330374975551Subject:Computer software and theory
Abstract/Summary:PDF Full Text Request
Hadoop has become the preferred framework for mass data processing. It’s consideredas “the Golden Key that connects to21centurary”. HDFS is the basis of Hadoop, whichprovides the user with a distributed file system. However, the HDFS central server,Namenode, is a single point of failure. The collapse of the entire file system caused by thefailure of Namenode has become troublesome problem for HDFS.To solve SOPF of HDFS, there are a variety of HDFS high-availability solutions. Theircore idea is to use backup machine to prevent the failure of Namenode. When it comes, thestandby can offer customers by read services. However, these solutions require manualintervention, and also face the possibility of data loss. Therefore, a new idea is appealed tosolve the SOPF of HDFS.This article design and implement HDFS HA base on dual-centerual server. Thissolution can not only solve the existing single-point problem, but also overcome thedisadvantage of the traditional HA solutions. Specifically, the main work of this paper is asfollows:(1) Reasearch current popular high-availablity solutions of HDFS, all of which arebased on the backup machine. These solutions are divided into hot standby and coldstandby. This paper points out their common disadvantage. Reasearch the HDFSarchitecture, combined with the status and role of the central server in HDFS. Thesolution based on dual central server is proposed.(2) Research Paxos, the only distributed consensus algorithm. Three Paxosalgorithm is designed to adopt to the three-machine environment; Based on thisalgorithm, one data synchronization framework, Quorum, is proposed. The dataflow of the read and write operations is standardized; According to this framework,the servers can also provide users read and write access even in the case of nodefailure.(3) Research HDFS source, deeply analysis HDFS code structure; Modify HDFScode in line with Quorum; Imlement the dual central servers; The assuranceprovided by Quorum now is also ported to HDFS; (4) Test the HDFS based on dual center server to verify the feasibility andeffectiveness of thisHA solution.The HDFS high availability to provide new ideas and methods for reference, while thedesign of three machine Paxos algorithm and data synchronization framework Quorum boththeoretical and practical significance, with a wide range of value.
Keywords/Search Tags:HDFS, SPOF, High-Availability, Paxos, Data synchronization, Dual CentralServer
PDF Full Text Request
Related items