Font Size: a A A

Research And Implementation Of Small File Processing Techniques In Hadoop

Posted on:2014-02-16Degree:MasterType:Thesis
Country:ChinaCandidate:G J ChenFull Text:PDF
GTID:2248330395984229Subject:Computer technology
Abstract/Summary:PDF Full Text Request
With the rapid development of the Internet, the traditional technical architecture in handling massive amountsof data has become increasingly weak. As massive data distributed processing application framework, Hadoopconsists of the HDFS file system and the MapReduce programming model. Using master-slave architecture designpattern, HDFS file system set the single name node and simplify the structure of the file system; but at the sametime, this method also led to the low efficiency of small files stored.For coping with Hadoop’s storage inefficiency in dealing with small files and overhead of MapReduce inprocessing a large number of small files, this thesis uses archive file approach and sequence file approach to solvethe problems by combining small file to big file and creating mapping from small file to large file. Atexperimental section, the thesis validates the approaches of optimizing the small files processing in Hadoop byusing several system tests with experimental use cases. By comparing the time of uploading small files to localfile system and HDFS file system, the time of accessing to the file before and after the merging small files and thesystem memory occupancy when reading files, the approaches proposed in this thesis are validated to be suitablefor MapReduce computation model and be able to improve the efficiency of random access to the small files.
Keywords/Search Tags:Hadoop, Massive Small Files, MapReduce, Merger, Index
PDF Full Text Request
Related items