Font Size: a A A

Research And Implementation Of HDFS High Availability Based On Cluster

Posted on:2013-05-19Degree:MasterType:Thesis
Country:ChinaCandidate:M HuangFull Text:PDF
GTID:2208330434472643Subject:Software engineering
Abstract/Summary:PDF Full Text Request
As the world’s information increases dramatically, the data is generated rapidly, which raises the requirement to high level for enterprise on the technology to handle such information, especially the way to store and handle the "Big Data". More and more enterprises manage to use Hadoop to deploy the system for Big Data, including Yahoo!, Facebook and IBM. Hadoop Distributed File System is core of Hadoop to implement data process, so HDFS has been key layer of IT infrastructure in many big companies, especially Internet companies. However, there is enterprise’s Service Level Agreement (SLA) for each IT aspect, e.g. high availability, high performance, data integration. HDFS, developed by Open Source community, is originally designed for big throughput, and its datanode is designed for redundancy, but its namenode, managing whole HDFS, has risk of single point failure which seriously lowers SLA. Thus, it has been critical for enterprise to eliminate this risk while deploying HDFS. This article firstly discusses the problem HDFS is facing as solution of Big Data, and then introduces history and mechanism of both HDFS and high-availability cluster in IT infrastructure for Big Data. Based on above discussion and introduction, we do insight research on HDFS namenode’s working process and native backup method, sequentially design and implement HDFS namenode cluster founded on Redhat Linux cluster suite, aiming optimized startup, monitor, recovery and stop functions for three service layers (Network, Application, Share storage access etc.) of namenode service and data backup. Finally, it simulates many failure and disaster cases of enterprise to test the namenode cluster, and get good result of high performance, high availability and high data integration etc., thus delivers an entire high availability HDFS cluster.
Keywords/Search Tags:Hadoop, HDFS, Distributed File System, single-point failurehigh-availability, cluster
PDF Full Text Request
Related items