Font Size: a A A

Research And Optimization Of Mass Small Files Based On HDFS

Posted on:2017-05-16Degree:MasterType:Thesis
Country:ChinaCandidate:R ZhangFull Text:PDF
GTID:2308330485984492Subject:Computer software and theory
Abstract/Summary:PDF Full Text Request
At present,with the rapid development of computer and information technology,the scale of application system is expanding rapidly, and the data produced by industry application is explosive growth.The raditional storage technology is becoming more and more difficult to deal with massive data storage.The distributed computing platform Hadoop,researched by the Apache foundation,quickly became the research institutions and enterprises’ first choice to deal with big data.At the same time,with the strong rise and rapid development of Internet industry,has spawned a huge amount of different types of small files,but the original design of the Hadoop is mainly for large file storage,and the large number of small files stored without much consideration.If the mass of small files directly upload to HDFS without any pre-processing,will cause the overstaffed of the original data files metadata stored in NameNode memory,and low efficiency problem of file access.Taking the advantage of Hadoop in large files processing,we solve the small files storage problem based on the idea of merging,can make the Hadoop also become applicable for small file storage.Before the mass small files optimization scheme, this thesis first to carry on the quantitative analysis of NameNode memory consumption and access efficiency,We can imple small file optimization from the following several aspects,reducing the manage number of files by the NameNode,reduce the time consumption when DataNode get blocks data from the disk,and so on.The implementation of Merging strategy proposed in this thesis is based on the MapFile,along with small file merging to create index,the index information is stored in HBase.At the same time,in order to speed up the documents retrieval efficiency,cache function module is introduced.According to the characteristics of small files,we used an improved cache replacement strategy.HDFS provides users with Hadoop Shell and Http access does not bring users intuitive experience,and the operation of the file is not very convenient.For these reasons, in this thesis we designed a virtual file sharing system based on CBFS.Through HDFS virtual on the far side of the file system will be turned into a disk in the local Windows file system,bring users intuitive feelings at the same time,greatly facilitate the user’s oeration.Finally,we put forward the optimization scheme,compared with original HDFS and Sequence file,found that both the NameNode high memory consumption and file access high latency problems have markedly improved,proved the feasibility and effectiveness of optimization of small files proposed in this thesis.
Keywords/Search Tags:mass small files, HDFS, merge, index, cache
PDF Full Text Request
Related items