Research And Implementation Of Hadoop Small File Processing Technology

Posted on:2017-05-18

Degree:Master

Type:Thesis

Country:China

Candidate:K Sheng

Full Text:PDF

GTID:2348330518496600

Subject:Electronic Science and Technology

Abstract/Summary:

PDF Full Text Request

With the growing trend of the Internet,the data is becoming more and more attention.However,the existing Hadoop framework has been a big bottleneck in dealing with the small files problem,this paper gives two solutions based on the characteristics of small files and Hadoop framework.Hadoop distributed file system HDFS consist of namenode and datanode,when a large number of small files stored in HDFS,because the namenode stores the file information,a large number of small files lead to large consumption of memory,so reducing the memory consumption of the namenode is an important goal.When the file is readed,because the client needs to have a data exchanging with the namenode and datanode,it is another important target to use reasonable method to speed up the file access speed.This paper divides small files into the structurlly related small files and logically related small files,the structurlly related files has a clear correlation between the small files,the file merged strategy,establishing a local index strategy,merging small files into a large file,and using the three level cache and prefetching,in order to reduce the memory consumption of namenode and the reading time of small files.The latter has no clear correlation between the small file,the file grouping strategy,establishing a global index strategy,putting the small file into the same logical unit,and using the three level cache and prefetching,in order to reduce the memory consumption of namenode and the reading time of small files.

Keywords/Search Tags:

Hadoop, massive small file, file merging, file packeting, three level cache

PDF Full Text Request

Related items

1	Optimization Study On Storing Massive Small Files Based On Hadoop
2	Research On Storage Strategy Of Massive Small Files Based On HDFS
3	Research On Small File Access Technology Based On Hadoop
4	Design And Realization Of Parallel File IO Based On Hadoop Distributed File System
5	Design And Realization Of Parallel File Io Based On Hadoop Distributed File System
6	Research On Small File Storage Mechanism For Hadoop
7	Research On Optimization Techniques Of Small File Based On Object Merging
8	Research And Design Of High Performance Distributed File System For Small File
9	The Optimization Technology And Application Of Massive Small File Access Based On HDFS
10	Research And Implement Of Distributed Massive Small File Storage Access Optimization