Font Size: a A A

Research On The Optimization Of Small Files Processing And Replication Strategy Based On HDFS

Posted on:2015-01-07Degree:MasterType:Thesis
Country:ChinaCandidate:T LiuFull Text:PDF
GTID:2298330431483981Subject:Computer software and theory
Abstract/Summary:PDF Full Text Request
As the open source implementation of the Google File System, Hadoop Distributed FileSystem(HDFS) has been proved to be efficient in processing large files, but ineffiecient whendealing samll files. It is mainly due to the serious NameNode memory consumption caused bymassive small files, and the single NameNode can become a performance bottleneck easily.In addition, HDFS determine the storage location by using Three-Replication static strategyand the way of location-aware. This strategy can be partially achieve fault tolerance and loadbalancing, but also accompanied by obvious defects, like the strategy is not that flexible, whichwill cause serious waste of storaged resoures abd further tmpact of loading balance.Considering the deficiencies mentioned above, this paper presents optimization programdealing with small files based on Indexing mechanism, the main idea is to replace the role ofNameNode by DataNode to ease the pressure of handling small files, and it can also solve thesingle Name Node’s bottleneck problem when dealing with a large number of requests. Inaddition we use caching mechanism, to further optimize the efficiency of reading the file.Propose dynamic copies strategy and implements the dynamic copies placement algorithm,according DataNode node index, finally obtain a dynamic replica placement algorithm tooptimize storage efficiency and load balancing capabilities of HDFS. The main innovation ofthis paper is as follows:1. On the basis of existing application-specific optimization algorithms, we propose a moregeneral program based on small files indexing mechanism, which solve the problem of storingand retrieving of small files very well.2. Use the cache mechanism into small files processing to optimize I/O operations of HDFSand to improve the function.3. Proposed a new comprehensive indicators, and quantify the Data Node’s status, then putforward dynamic replication strategy and implement dynamic replica placement algorithm.According to design scheme, the article finally has carried on the corresponding simulation experiments, it can be seen from the experimental results, the design has different degrees ofascension of performance on accurate and scientific of data deduplication, small files I/O speedand the NameNode memory usage, which illustrates the effective and scientific of the design.
Keywords/Search Tags:HDFS, Optimization of Small Files, Indexing Mechanism, Cache, DynamicReplication Strategy
PDF Full Text Request
Related items