Research On Small File Access Technology Based On Hadoop

Posted on:2021-03-25

Degree:Master

Type:Thesis

Country:China

Candidate:H M Liu

Full Text:PDF

GTID:2518306470470194

Subject:Software engineering

Abstract/Summary:

PDF Full Text Request

With the development of information technology and the popularization of network,the data closely related to our life grows rapidly in the form of explosive index.Although the Hadoop distributed file system is popular in the distributed storage space,there are some bottlenecks in the access performance for handling large Numbers of small files:(1)Name Node memory pressure,load imbalance,memory space utilization is low;(2)The file access mechanism affects the file reading performance.In view of these problems,this paper makes research and technical improvement from two aspects: the storage performance of small files and the reading performance of small files.In the first aspect,the storage performance of small files,through data analysis to obtain the size of the file threshold and verify;Based on the historical access log of small files,the Aprior algorithm is used to preprocess the selected small files.After the analysis of the historical access log data of files,the correlation probability model is constructed,the relevance of files is calculated,a merging algorithm based on directed graph is designed,which effectively improves Name Node memory utilization.In the second aspect,the reading performance of small files,this paper changes the storage location of metadata,and a multi-level index mechanism of metadata based on merging time is established.The heat of the file is introduced into the cache,the LRU cache replacement strategy is improved,and the heat based LRU replacement strategy is proposed.Taking correlation as the relevant factor of file prefetching,a prefetching mechanism based on correlation is proposed,which effectively improves the efficiency of file access.Besides,based on the above research content,a general massive file storage system based on Hadoop was designed and implemented in this paper,and conducts performance tests on the pseudo-distributed platform with the original HDFS scheme,HAR scheme and MPM scheme.The consumption of metadata,the time of reading files and the time of uploading files were taken as the quantitative indexes of the experimental test respectively.The experimental results show that this scheme can meet the needs of alleviating the pressure of nodes and increasing the speed of file reading and uploading when accessing massive small files for distributed file storage systems,which provides technical support for the access problem of massive small files.

Keywords/Search Tags:

Data storage, HDFS, Small file, Merging algorithm, Caching mechanisms

PDF Full Text Request

Related items

1	Research And Application Of The Optimization Strategy Of File Storage And Reading Based On HDFS
2	Research And Implementation Of Small File Storage Model Based On HDFS
3	The Research Of HDFS Optimization Towards Lots Of Small Files Accessing And Storage
4	Research On HDFS Small File Access Method Based On Frequent Item Set
5	Improvement Of HDFS Small File Storage Based On Har
6	Research On Key Technology Of Small File Storage Based On HDFS
7	Optimization And Implementation Of Small File Storage In HDFS Under Hadoop Platform
8	Research And Design Of Multi-Tenant Small File Storage System Based On HDFS
9	High-performance File Storage And Management System Based On HDFS
10	Research On Storage And Access Startegy Of Massive Small Files On HDFS