Font Size: a A A

Research And Optimization Of Storage Mechanism In Hadoop Distributed File System

Posted on:2019-06-06Degree:MasterType:Thesis
Country:ChinaCandidate:Y F LvFull Text:PDF
GTID:2428330545459444Subject:Computer application technology
Abstract/Summary:PDF Full Text Request
With the application of Internet in all fields,the data increased explosively.Thus traditional ways to process data are not applicable for this situation.Various technologies of data storage and processing have developed rapidly correspondingly,such as cloud storage and cloud computing.Hadoop Distributed File System,the fundamental storage device in cloud platform,becomes more and more popular among major enterprises and research institutions because of its high scalability,high fault-tolerance,open-source project and adaption in low cost machine.And it plays an increasingly important role in education,financial,medical and military fields.The original HDFS is an architecture of “one master and multiple slaves”,which stores the metadata and real file data separately and with NameNode managing the key namespace of the whole file system.Although this design simplifies the system structure,it also brings the problem of high availability of NameNode.Once the master node fails,the whole cluster will be disabled.In addition,HDFS is designed for the service of big files in streaming way,which is not suitable for storing or processing massive small files.However,a large quantity of small files is producing by various social and shopping websites.Storage of these small files directly will cause high memory consumption and low efficiency of processing.Aiming at the high availability of NameNode,this thesis summarizes the HDFS high availability solutions through the analysis and comparison of several early versions and leads to the new mechanism of Hadoop 2.X,the mechanism of HDFS HA.After a thoroughly study of its mechanism,the author puts forward a solution of adding a standby node in the current HA system.The author also proposes an optimized scheme of metadata consistency and active/standby switchover,which provides more possibilities for expanding multiple NameNode nodes in the cluster.Finally,the experiment verified that the optimized scheme not only guarantees the consistency of metadata,but also automatically switches in the case of two-node failure,with the switching time much less than that of the original HA scheme.On the issue of low efficiency when the HDFS dealing with small files,this thesis puts forwards a corresponding improvement solution.The proposed scheme includes merging algorithm based the volume of small files in the aspect of storage and two-level index solution in the aspect of accessing files,and it adds a small files processing unit based on the original architecture of HDFS to merge files and to build index.The files merging process considers the volume of small files.Each block space is fully used in order to reduce the number of merged files,which alleviates the memory pressure of NameNode.In addition,the solution contains the mapping of small files to a specific block and address information according to the name and type of small files.According to the file type the sub-index forms the global index which occurs in the small file processing unit to promote the retrieval efficiency.Finally,through comparison test of the proposed scheme on a well-built Hadoop platform and the original Har solution,the author draws the conclusion that the proposed scheme can significantly improve the efficiency of storing and accessing small files in HDFS.
Keywords/Search Tags:Cloud Storage, HDFS, NameNode, High Availability, Small Files
PDF Full Text Request
Related items