Font Size: a A A

The Design And Implementation Of High Availability HDFS Management

Posted on:2014-01-05Degree:MasterType:Thesis
Country:ChinaCandidate:Q F JiFull Text:PDF
GTID:2248330395995490Subject:Software engineering
Abstract/Summary:PDF Full Text Request
Distributed File Systems are getting more and more widely used with the rapid growth of the amount of data and data-intensive applications. Most mainstream distributed file systems, like HDFS, manage its metadata and file data individually on metadata node (like Namenode in HDFS) and datanode. Therefore, the availability of the metadata node determines the availability of distributed file system.The availability of the metadata node in distributed file system is deeply researched in this thesis, including the metadata node management mode in distributed file system and their advantages and disadvantages, the key technologies to improve the availability of metadata node and some pracital high availability system architectures. And introduce the DRBD and Pacemaker technology.I design and implement the high avalability HDFS management with DRBD and Pacemaker. And this high availability system contains the following key features:(1) Active/Cold Standby Namenodes are used, which brings out a recovery after active namenode crashed.(2) Use DRBD to backup Namenode metadata and editlogs; Use pacemaker to monitor the system services’status and can make auto failover when some failure happened.(3) Provides a hadoop cluster management. User can easily deploy the HDFS, monitor the serice status and manage the HDFS with this management.Some tests are done with this high availability system. It is found that failover can be down within one minute with SecondaryNamenode enabled and there is only slightly effect with the Namenode operation performance as well as the HDFS read and write performance.
Keywords/Search Tags:Distributed File System, HDFS, High Availability, DRBD, Pacemaker, HadoopCluster Management
PDF Full Text Request
Related items