Font Size: a A A

Research And Implementation Of Mass Small File Based On HDFS

Posted on:2019-10-12Degree:MasterType:Thesis
Country:ChinaCandidate:Y F ZhangFull Text:PDF
GTID:2428330590975440Subject:Software engineering
Abstract/Summary:PDF Full Text Request
With the development of the Internet,the scale of data that companies and research institutions need to store and process has grown exponentially.The storage capacity,usage costs,and maintenance costs of traditional stand-alone storage technologies are no longer sufficient.The Hadoop,an open source distributed framework developed by the Apache Foundation,can be deployed on a large number of inexpensive machines,through simple programming models for data processing and storage,greatly reducing costs and becoming the choice of more and more companies and research institutions.At the same time,with the rise of high-frequency applications such as WeChat,qq,and toutiao,the demand for storage of small files based on voice,pictures,and sensing information is increasing.The HDFS-Hadoop's distributed file system is mainly designed for storing large files.It does not consider the scenario of storing small files.If you upload large files directly to HDFS,it will cause a large memory pressure on the NameNode and the file access efficiency.Lower.Although some compensatory measures have been added later,it is still not flexible enough.This thesis uses the idea of merging small files to merge small files into large files for storage,so that the load of NameNode is greatly reduced when Hadoop stores large files.First of all,this thesis first studies the current popular distributed file systems GFS,HDFS,TFS,analyzes the main modules of HDFS,and deeply studies the implementation principles of DFSClient,NameNode,DataNode and other modules.Theoretically proves the limitations of HDFS for storing small files.Secondly,by reading a large number of domestic and foreign documents,we understand the solution to massive file processing for HDFS.This thesis discusses and analyzes the three solutions that come with Hadoop,and proposes solutions for small files based on MapFile and multidimensional tables.It describes in detail the specific implementations of the two ideas and compares their usage scenarios.Finally,a solution based on Redis is further proposed to further improve the speed of small file query.Hadoop provides two HDFS access methods: Shell and Http.These two methods are inconvenient and intuitive.Therefore,this article implements a CBFS-based virtual file system,which can obtain similar local file system operating experience.Finally,this thesis compares the traditional HDFS storage scheme with the scheme proposed in this thesis.The test results show that this scheme can effectively reduce the NameNode memory pressure and reduce the file read/write delay.Prove the feasibility and effectiveness of the small file optimization solution in this thesis.
Keywords/Search Tags:Distributed File System, HDFS, Small File Merge, MapFile, Multidimensional Table
PDF Full Text Request
Related items