Font Size: a A A

Research And Implementation Of A Strategy To Optimize The Storage Of Small Files On HDFS

Posted on:2014-10-25Degree:MasterType:Thesis
Country:ChinaCandidate:Z D GaoFull Text:PDF
GTID:2268330422964742Subject:Computer technology
Abstract/Summary:PDF Full Text Request
With the explosion of Internet data, data storage and computation is becoming one ofthe biggest challenges in the big data age. Therefore, a certain number of cloud storagesystems comes into being these years. HDFS which is open-source system based onGoogle GFS and mainly used for storing huge numbers of big data, has the character ofhigh reliability, high availability, and high scalability. HDFS has a master and slavearchitecture, with one centre node storing the meta-data, and lots of data nodes storing thereal data. Big file will be split into several blocks which are stored in data nodes, and eachblock will have three copies distributed into different data nodes. When lots of small filesstored on HDFS, it will cost a lot of memory and maybe cause a flood request in the centrenode.This paper does some research on the solution which is based on combination andcompression on server side to solve small problems,but read-write performance of thesolution is bad because of multilevel searching process. As a result,a new simple solutionwhich is based on file merging process is designed. small files will be buffered on clientside and combined into a big file which has a index table of small files at the beginning,then stored into data nodes as one file block. Inode structure is extended with a map ofsmall files, to reduce memory usage. a index of small file is added to obtain the small filefrom data nodes, and sort of prefetching strategy is given to improve read performance.At last, a test program is designed to test the performance of memory cost andread-write rate. Compared with original system, the memory usage is saved to70%percent,writing time is reduced about20percent, and reading time is reduced about40percent.
Keywords/Search Tags:Distributed file system, Name node, Data node, Meta-data, Small files
PDF Full Text Request
Related items