Research Of Small Files Storage Method Based On HDFS

Posted on:2014-02-06

Degree:Master

Type:Thesis

Country:China

Candidate:Q W Dong

Full Text:PDF

GTID:2248330398452534

Subject:Computer Science and Technology

Abstract/Summary:

PDF Full Text Request

With the continuous development of science and technology, digital information is showing explosive growth, the traditional way has already not be able to meet the demand of storing massive data. So, it becomes an urgent problem to store and process massive data efficiently. At present, many large enterprises use the Hadoop HDFS (Hadoop Distributed File System) to store massive data. HDFS is designed to store large files with good reliability and scalability at the first. But with the development of Internet, people start to apply the HDFS to store small files and the existing shortcomings and deficiencies in it are exposed. The storage of small files has already been a bottleneck and restricted the overall performance of HDFS.This paper mainly studies the problem of small files stored in HDFS. For problem of processing work before small files stored into HDFS and retrieval work after storage, proposed three algorithms. Firstly, we will introduce Small Files Merging Algorithm based on Feature type and Sequence table. This algorithm is based on getting the characteristics of small files and the data types that the characteristics belongs to, merge the small files in the flow-through way, and create an index file based on the file name by NameNode unified management. Secondly, we present DataNode Pre-Allocation Algorithm based on Data Feature. The purpose of the Algorithm is to improve the efficiency ofNameNode and to reduce the entire performance impact of HDFS due to the overloading of NameNode. Thirdly, we propose Small Files Retrieval Algorithm based on Frequency of Access. The algorithm can be achieved to quickly find small files that we need from mass index files. It draws the ideas of virtual storage and page replacement. When users search, the index files are loaded to the virtual memory and replaced according to the index file access frequency.By this way. we can quickly hit the index file we want.We apply three different use cases which are designed by adjusting percent of small files and threshold in algorithms to test the system performance of three algorithms we present above. Experimental results show that the three algorithms can effectively improve the efficiency of HDFS for small files to store and read, and optimize the storage performance of entire HDFS.

Keywords/Search Tags:

HDFS, Data Feature, Small files, Storage

PDF Full Text Request

Related items

1	Research And Optimization Of The Distributed Storage On HDFS
2	Research On Efficient Storage Of Small Files In Mobile Ultrasound Detection Based On HDFS
3	The Research Of HDFS Optimization Towards Lots Of Small Files Accessing And Storage
4	Research And Implementation Of Small File Storage Model Based On HDFS
5	Research And Application Of Small Files Storage Method Beased On HDFS
6	Research And Implementation Of A Strategy To Optimize The Storage Of Small Files On HDFS
7	The Research And Implementation Of Mass Small File Storage System
8	Design And Implementation Of Secure Cloud Storage System Based On HDFS Small File Processing
9	Reading And Writing Strategy Research Of Massive Small Files Based On HDFS
10	Research And Optimization Of Storage Performance Of Massive Small Files In Cloud Environment