Research And Optimization Of Storage Mechanism In Hadoop Distributed File System

Posted on:2019-06-06

Degree:Master

Type:Thesis

Country:China

Candidate:Y F Lv

Full Text:PDF

GTID:2428330545459444

Subject:Computer application technology

Abstract/Summary:

PDF Full Text Request

With the application of Internet in all fields,the data increased explosively.Thus traditional ways to process data are not applicable for this situation.Various technologies of data storage and processing have developed rapidly correspondingly,such as cloud storage and cloud computing.Hadoop Distributed File System,the fundamental storage device in cloud platform,becomes more and more popular among major enterprises and research institutions because of its high scalability,high fault-tolerance,open-source project and adaption in low cost machine.And it plays an increasingly important role in education,financial,medical and military fields.The original HDFS is an architecture of “one master and multiple slaves”,which stores the metadata and real file data separately and with NameNode managing the key namespace of the whole file system.Although this design simplifies the system structure,it also brings the problem of high availability of NameNode.Once the master node fails,the whole cluster will be disabled.In addition,HDFS is designed for the service of big files in streaming way,which is not suitable for storing or processing massive small files.However,a large quantity of small files is producing by various social and shopping websites.Storage of these small files directly will cause high memory consumption and low efficiency of processing.Aiming at the high availability of NameNode,this thesis summarizes the HDFS high availability solutions through the analysis and comparison of several early versions and leads to the new mechanism of Hadoop 2.X,the mechanism of HDFS HA.After a thoroughly study of its mechanism,the author puts forward a solution of adding a standby node in the current HA system.The author also proposes an optimized scheme of metadata consistency and active/standby switchover,which provides more possibilities for expanding multiple NameNode nodes in the cluster.Finally,the experiment verified that the optimized scheme not only guarantees the consistency of metadata,but also automatically switches in the case of two-node failure,with the switching time much less than that of the original HA scheme.On the issue of low efficiency when the HDFS dealing with small files,this thesis puts forwards a corresponding improvement solution.The proposed scheme includes merging algorithm based the volume of small files in the aspect of storage and two-level index solution in the aspect of accessing files,and it adds a small files processing unit based on the original architecture of HDFS to merge files and to build index.The files merging process considers the volume of small files.Each block space is fully used in order to reduce the number of merged files,which alleviates the memory pressure of NameNode.In addition,the solution contains the mapping of small files to a specific block and address information according to the name and type of small files.According to the file type the sub-index forms the global index which occurs in the small file processing unit to promote the retrieval efficiency.Finally,through comparison test of the proposed scheme on a well-built Hadoop platform and the original Har solution,the author draws the conclusion that the proposed scheme can significantly improve the efficiency of storing and accessing small files in HDFS.

Keywords/Search Tags:

Cloud Storage, HDFS, NameNode, High Availability, Small Files

PDF Full Text Request

Related items

1	Research On Performance Optimization Technology Of Namenode Based On HDFS
2	Research And Optimization On Distributed Storage Based On HDFS
3	The Implementation And Optimization Of Cloud Storage System Based On HDFS
4	Research And Optimization Of The Distributed Storage On HDFS
5	Research And Optimization Of Storage Performance Of Massive Small Files In Cloud Environment
6	Reading And Writing Strategy Research Of Massive Small Files Based On HDFS
7	Design And Implementation Of Secure Cloud Storage System Based On HDFS Small File Processing
8	The Research And Implementation Of Storing Massive Small Files In Cloud Storage
9	The Research On Small File Storage Strategy Of Cloud Storage Platform
10	Research Of Small Files Storage Method Based On HDFS