Research And Implementation Of Small File Processing Techniques In Hadoop

Posted on:2014-02-16

Degree:Master

Type:Thesis

Country:China

Candidate:G J Chen

Full Text:PDF

GTID:2248330395984229

Subject:Computer technology

Abstract/Summary:

PDF Full Text Request

With the rapid development of the Internet, the traditional technical architecture in handling massive amountsof data has become increasingly weak. As massive data distributed processing application framework, Hadoopconsists of the HDFS file system and the MapReduce programming model. Using master-slave architecture designpattern, HDFS file system set the single name node and simplify the structure of the file system; but at the sametime, this method also led to the low efficiency of small files stored.For coping with Hadoop’s storage inefficiency in dealing with small files and overhead of MapReduce inprocessing a large number of small files, this thesis uses archive file approach and sequence file approach to solvethe problems by combining small file to big file and creating mapping from small file to large file. Atexperimental section, the thesis validates the approaches of optimizing the small files processing in Hadoop byusing several system tests with experimental use cases. By comparing the time of uploading small files to localfile system and HDFS file system, the time of accessing to the file before and after the merging small files and thesystem memory occupancy when reading files, the approaches proposed in this thesis are validated to be suitablefor MapReduce computation model and be able to improve the efficiency of random access to the small files.

Keywords/Search Tags:

Hadoop, Massive Small Files, MapReduce, Merger, Index

PDF Full Text Request

Related items

1	Research On Processing Techniques Of Massive Small Files Based On Hadoop
2	Study On Processing Of Massive Small Files Based On Hadoop
3	Optimization Study On Storing Massive Small Files Based On Hadoop
4	Research And Design Of Massive Small Files Merging Based On Hadoop
5	Design And Implementation Of The Key Techniques For Storing And Retrieving Massive Small Files In Hadoop
6	The Design And Implementation Of Massive Small Files Storage System Based On HDFS
7	Research And Application Of Massive Small Files Processing Techniques Based On Hadoop
8	Research And Implementation Of Fast Retrieval Technology For Massive Small Files
9	Research And Optimization Of Small Files Processing Techniques In Hadoop
10	The Research And Implementation Of Storing Massive Small Files In Cloud Storage