Font Size: a A A

The Research Of Increase The IO Speed Of Small Files In HDFS

Posted on:2012-12-09Degree:MasterType:Thesis
Country:ChinaCandidate:L JiangFull Text:PDF
GTID:2178330335460746Subject:Computer Science and Technology
Abstract/Summary:PDF Full Text Request
Cloud computing is the hottest topic today, the default distributed file system of Hadoop (HDFS) has become the actual standard because of its reliable, scalable and high capability when storing large files, but when storing a large amount of small files, the speed become very slowly because of the single Namenode has to deal with the entire request. In this paper, first, the paper briefly introduce the principle and implementation of some popular distributed file system and then it make a deep analysis of HDFS which include the architecture, data structure, block and so on. At last it analyzes the disadvantage of the existing solutions to deal with small files in HDFS.In this page, to solve the Namenode's bottleneck with small files, the paper provide a different solution to deal with small files in HDFS that is to make the Datanode caching some metadata of the small file. The Datanode can process the request at the maximum probability when client post a request of small file through this way. It first find the data from the Datanode, if it can't find anything from that then it will find from the Namenode, this will reduce the amount of request processed by the Namenode greatly.Meanwhile, in the condition of web2.0, some files will be read frequently and the data in the file will become very hot. In order to increase the speed in this situation, the paper provides two ways to solve this problem. First, the paper provide a method of dynamic copy of block to reduce the burden of Datanode which stores the block, second, let the Datanode caching some of the small files which was read a lot to reduce the time of read the data. Using this two ways, the system can increase the read speed of hot data greatly.
Keywords/Search Tags:HDFS, small files, distributed file system, dynamic managing of block
PDF Full Text Request
Related items