Font Size: a A A

Research On The Metadata Management Of Multi Namenodes Based On HDFS

Posted on:2014-09-30Degree:MasterType:Thesis
Country:ChinaCandidate:B ZhangFull Text:PDF
GTID:2268330401965372Subject:Computer application technology
Abstract/Summary:PDF Full Text Request
This is an era with the rapid expansion of data. According to the estimate of thetechnology research firm IDC, a great quantity of data is emerging continuously. Theamount increases at the speed of50%every year, in other words, it will double everytwo years. Not only the flood of data is getting broader and broader, but also it putsforth more and more brand new branches. Data shows the characteristics of vast,various, velocity and high value, so it has no way to store, manage data and processtasks using conventional software solutions under tolerable time. How to cope with thedata challenge becomes a burning question. Therefore, more and more colleges,institutes and Internet companies throw themselves into the research of the models andtools of the storage and computing of large-scale data. As a technology set which isappropriate to solve the problem of distributed storage and computing infrastructure, theHadoop project of the Apache Software Foundation attracts tremendous attention sincebirth. As a result, it becomes the hotspot of the research and application of industry.What’s more, it is honoured as "the golden key to open the gate of big data".The main research object of the thesis is the distributed file system HDFS in theproject of Hadoop. As the basis module of Hadoop, HDFS provides data services forseveral upper tools such as MapReduce and HBase. However, it exposes some defectsof the architecture while applying HDFS into large-scale distributed project. Firstly, theHDFS architecture with single Namenode will bring down the availability of the system.Secondly, the single Namenode will be the bottleneck of the whole file system.Aiming at the mentioned defects, the thesis proposes an improved HDFSarchitecture with multi Namenodes. Namenodes form a cluster by expanding thefunctions of Namenode to other nodes in the cluster. The cluster can be divided into tworoles according to the feature: Namenode Leader and Common Namenode(s).Namenode Leader can get the available status and the load information of theNamenodes cluster by means of the heartbeat mechanism. After researching severaldistributed consistency mechanisms, the thesis designs the election mechanism of Namenode Leader based on the Paxos algorithm. Besides, it also designs the processflow of the failure of nodes.For the metadata management problem brought by the distributed Namenodearchitecture, this thesis proposes a novel metadata distribution mechanism based on theHash algorithm and the real-time cluster load conditions. The namespace and metadataservice maintained by the Namenode are distributed based on the metadata structure andthe access characters. Furthermore, the thesis designs the metadata redundancemechanism in the cluster and the metadata consistency guarantee mechanism based onthe classical distributed consistency algorithm Paxos.In the end, several experiments are conducted to prove that the high availabilityand the efficiency of reading and writing of the improved HDFS architecture.
Keywords/Search Tags:HDFS, Namenode, metadata, high-availablity
PDF Full Text Request
Related items