Font Size: a A A

Research And Implementation Of Hadoop Small File Processing Technology

Posted on:2017-05-18Degree:MasterType:Thesis
Country:ChinaCandidate:K ShengFull Text:PDF
GTID:2348330518496600Subject:Electronic Science and Technology
Abstract/Summary:PDF Full Text Request
With the growing trend of the Internet,the data is becoming more and more attention.However,the existing Hadoop framework has been a big bottleneck in dealing with the small files problem,this paper gives two solutions based on the characteristics of small files and Hadoop framework.Hadoop distributed file system HDFS consist of namenode and datanode,when a large number of small files stored in HDFS,because the namenode stores the file information,a large number of small files lead to large consumption of memory,so reducing the memory consumption of the namenode is an important goal.When the file is readed,because the client needs to have a data exchanging with the namenode and datanode,it is another important target to use reasonable method to speed up the file access speed.This paper divides small files into the structurlly related small files and logically related small files,the structurlly related files has a clear correlation between the small files,the file merged strategy,establishing a local index strategy,merging small files into a large file,and using the three level cache and prefetching,in order to reduce the memory consumption of namenode and the reading time of small files.The latter has no clear correlation between the small file,the file grouping strategy,establishing a global index strategy,putting the small file into the same logical unit,and using the three level cache and prefetching,in order to reduce the memory consumption of namenode and the reading time of small files.
Keywords/Search Tags:Hadoop, massive small file, file merging, file packeting, three level cache
PDF Full Text Request
Related items