Font Size: a A A

Research And Implementation Of Small File Optimization Storage Management System Based On HDFS

Posted on:2021-03-06Degree:MasterType:Thesis
Country:ChinaCandidate:J T NiFull Text:PDF
GTID:2518306308970829Subject:Software engineering
Abstract/Summary:PDF Full Text Request
HDFS(Hadoop Distributed File System)can provide distributed data storage solutions for massive data.However,HDFS is designed for large files of TB(Trillionbyte)level.When a large number of small files are contained in the File,HDFS network storage architecture centered on the main server will become a bottleneck,resulting in low efficiency of File storage and reading.This paper focuses on this problem.This paper introduces the background of the project and related technologies,and through the experimental verification and process analysis,it determines the deep reasons for the low efficiency of small file storage and reading.On this basis,this paper proposes a performance optimization scheme for small file storage,which solves the memory bottleneck problem of NameNode by merging small files in parallel during the stored procedure.The scheme also includes a file clustering algorithm based on file relevancy in the process of file merging,so that files with large correlation are clustered together.The scheme also includes a method of equalizing multi-threaded merging,so that the related small files after clustering are merged into a large file close to the block size,so as to make the best use of the block storage space.Based on the above small file storage performance optimization scheme,this paper proposes a corresponding read performance optimization scheme,which is based on the independent index server,combined with the NameNode.It uses the dual index structure to read small files,while relieving the NameNode reading pressure,and improving the reading rate of small files.The scheme also designs the file cache and the prefetch module,the file can be read into the cache in advance,so as to locate the target file faster.A number of comparative experiments are carried out in this paper,and the experimental results show that the proposed solution can improve the storage and reading efficiency of HDFS in a large number of small files.Based on the above HDFS small file optimization access scheme,a small file optimization storage management system is developed.This paper introduces the system requirement analysis and feasibility of the design,and presents the network architecture of system design,function module design and database table design.Then,it also describes the user management module,rights management module,file management module of detailed design and implementation process.Finally,the results of function test and performance test show that the system can optimize the storage and reading of small files.
Keywords/Search Tags:HDFS, Large number of small files, File clustering, Distributed storage management system
PDF Full Text Request
Related items