Research On Storage Strategy Of Massive Small Files Based On HDFS

Posted on:2018-06-04

Degree:Master

Type:Thesis

Country:China

Candidate:S K Xu

Full Text:PDF

GTID:2348330563452692

Subject:Computer technology

Abstract/Summary:

PDF Full Text Request

In recent years,driven by the rapid development of mobile Internet,Internet of things,cloud computing technology,the collected data showed a blowout growth,data types are numerous,the volume changes,the flow speed fast,and with the accumulation of time will produce thousands,trillion level of small files,how to optimize the storage of these small files become the current academic and industry recognized problems.As a widely recognized Hadoop distributed system infrastructure,its distributed storage system HDFS(Hadoop Distributed File System)has become the first choice for massive file storage,which uses NameNode and DataNode for file management and storage.However,HDFS is designed to solve the problem of mass flow file storage;it is not suitable for data storage of large-scale small-volume files.Therefore,this topic is based on HDFS platform,from the optimization strategy and access strategy of large number of small files,Focus on solving the current NameNode design of small files stored in memory loss is too large and the problem of low efficiency of reading small files in HDFS.The main results of this paper are as follows:(1)When a large-scale small volume file is stored on HDFS,the NameNode in memory generates a metadata file for each small file.The more the number of small files,the more the amount of metadata,so NameNode memory loss is greater.To this end,this paper designed a small file upload processing module.It mainly consists of four functional units: Firstly,through the determine unit to filter the file under the directory to find files that meet the characteristics of small files;Then,the small files are processed by the file processing unit,and the small files with relevant characteristics are classified;Finally,all kinds of small files are merged into large files by file merging unit.If you add a small file on the basis of the merged file,you can add the file by file appending unit.Through the file appending unit can reduce the number of merged files and metadata files,the file management more convenient.(2)When reading a large number of small files from HDFS,reading a small file will interact with the NameNode each time,and the small file on the DataNode position is more dispersion,and then read large-scale small file efficiency will be very low.In response to this problem,this article designed a small file reading method.In order to improve the efficiency of small file reading,the index table based on MYSQL Memory data engine is designed,while using the client cache and data node distributed independent cache to cache the required data information.And in order to solve the problem of cache hit rate,we use of the file prefetching mechanism to improve the cache hit rate.Through the experiment in the paper to verify the effectiveness of small file upload framework and read method,in experiment the file upload speed,Memory usage and file read efficiency were compared.The results show that the proposed scheme can alleviate the Memory pressure of NameNode nodes and effectively improve the speed of small file upload and read.

Keywords/Search Tags:

Hadoop, HDFS, massive small file, file merge, cache mechanism

PDF Full Text Request

Related items

1	The Optimization Technology And Application Of Massive Small File Access Based On HDFS
2	Optimization Study On Storing Massive Small Files Based On Hadoop
3	Optimization And Implementation Of Small File Storage In HDFS Under Hadoop Platform
4	Research And Implementation Of Hadoop Small File Processing Technology
5	Research And Implementation Of Mass Small File Based On HDFS
6	Research On Small File Storage Mechanism For Hadoop
7	Research On Storage And Access Startegy Of Massive Small Files On HDFS
8	Research And Design Of Massive Small Files Merging Based On Hadoop
9	Research And Implementation Of Mass Small File Storage System Based On HDFS
10	Improvement Of HDFS Small File Storage Based On Har