Font Size: a A A

Improvement Of HDFS Small File Storage Based On Har

Posted on:2018-08-03Degree:MasterType:Thesis
Country:ChinaCandidate:Y ZengFull Text:PDF
GTID:2348330542459899Subject:Computer technology
Abstract/Summary:PDF Full Text Request
With the continuous development of information technology,we are in an era of data explosion growth,but also a more and more rely on data to make decisions of the times.Traditional storage technology has been exposed in the face of massive data challenges,its poor scalability,low data security,maintenance and management of high cost,poor disaster recovery and so the shortcomings of the increasingly exposed.How to efficiently process and store large amounts of data becomes a matter of continuous improvement and resolution of information science and technology.Hadoop has the characteristics of high reliability,high scalability,high efficiency and low cost,and has gradually developed into a complete system of large data system ecosystem,which provides a comprehensive solution for the development of large data.There is no doubt that Hadoop has become a powerful driving force for the development of large data industry.HDFS in the face of a large number of small file storage needs,there Namenode memory footprint is too high,Namenode frequent interaction,causing Namenode single point of failure,low efficiency cluster.In this paper,Hadoop Achive's Houghop Achive is optimized to optimize the Hadoop Achive index structure,change Hadoop Achive's two-tier index structure to an index structure,and use a more efficient hash function to improve From the archive file to read the efficiency of small files,while reducing the Namenode memory footprint.It is proposed to use the consistency hash algorithm to improve the distribution of index files in Namenode.Using the consistency hash algorithm to compare the scalability and fault tolerance of the common mode,the cluster data is minimized when the cluster changes,At the same time,the load balancing ability of the consistency hash algorithm is guaranteed,and the distribution of the data is adaptable to the processing capability of the server.The load balancing problem exists in Namenode Federation,and some work is made for the change of its architecture.
Keywords/Search Tags:Big Data, Hadoop, HDFS, Small file problem, Consistent hash
PDF Full Text Request
Related items