Font Size: a A A

Research On Key Technology Of Small File Storage Based On HDFS

Posted on:2017-05-17Degree:MasterType:Thesis
Country:ChinaCandidate:H SeFull Text:PDF
GTID:2308330488959184Subject:Computer application technology
Abstract/Summary:PDF Full Text Request
The cloud storage system is a large scale data storage, which uses the distributed file system as the storage platform. HDFS(Hadoop Distributed File Sysytem) is an open source cloud computing platform Hadoop distributed file system, its design is simple, widely used. But HDFS’s design was originally designed to handle large files, and at present with the development of the Internet, resulting in a growing number of small files need to be processed, which brings the challenge to HDFS.HDFS uses single node metadata namenode to manage the metadata of the whole system, and in order to improve the access efficiency, HDFS stored the metadata in memory, but when stored in the system with a large number of small files, produce a lot of metadata, the larger the namenode memory footprint, which is affected the performance. In addition, a large number of small file access, the need to frequently send requests to the NameNode, resulting in NameNode node overload work, becoming a bottleneck in the overall performance of the system. In order to solve this problem, so that HDFS can adapt to the storage of small files, this paper carries on the related research, mainly including:Deep understanding of the system architecture design and working principle of HDFS.In this paper, we propose a HDFS small file storage solution based on merging strategy, and give the improved system architecture. The new scheme will be a small file into a large file storage, and create the appropriate index, in the file read, the design of appropriate data prefetching and caching strategies to improve access efficiency.In addition, in order to further improve the file merging search access efficiency, the small files for classification, labelling, and according to the size of the files were merged, and at the end of the namenode store small label file hash index and the large file block B + tree index and different size Xiaowen a corresponding block in the index. Then, a three - level caching strategy is designed to reduce the number of requests for NameNode when accessing small files, and to improve the efficiency.Finally, we show that the proposed scheme can effectively improve the efficiency of small file access and reduce the memory overhead of NameNode by using several sets of experiments.
Keywords/Search Tags:cloud storage, HDFS, Small File, Index
PDF Full Text Request
Related items