Font Size: a A A

Research And Implementation Of Mass Small File Storage System Based On HDFS

Posted on:2022-06-07Degree:MasterType:Thesis
Country:ChinaCandidate:F TianFull Text:PDF
GTID:2518306602967599Subject:Master of Engineering
Abstract/Summary:PDF Full Text Request
With the development of the Internet industry,it has directly catalyzed the tremendous development of key technologies in many fields such as big data,cloud computing,and artificial intelligence.In this context,it has greatly catalyzed the implementation of the Internet of Things technology in all aspects of our lives.The main functions of the physical network big data platform include the collection,storage,and mining of target device data,and its data mainly has the characteristics of massive and heterogeneous.This article chooses the current very hot mass storage of small files as the research topic.In addition to the above-mentioned Io T scenarios,data in the form of small files is actually everywhere in our lives,such as photos,audio and video,log files,etc.,with massive amounts of data.The two major characteristics of heterogeneous and heterogeneous files,and the traditional file system basically cannot provide native support and may even cause the system to crash.Therefore,how to centrally store large amounts of small data and achieve rapid retrieval and access has become an urgent problem in the development of the current Internet industry.In response to the above problems,this paper proposes a centralized storage solution for massive small files based on the big data processing framework Apache Hadoop underlying file system HDFS.First,through the analysis of the native architecture of the HDFS system,it is found that it is natively used to store large files with a large single granularity.In terms of small file storage,the reading efficiency is low and the cluster master may be caused by excessive single-point pressure on the Namenode.The node crashes and then causes the problem of system service interruption.In response to this problem,this paper proposes a set of targeted architecture transformation schemes for massive small file storage scenarios,and designs a system-level multi-level data caching strategy to support massive small files in terms of functionality and read and write performance.Centralized storage.The specific research content includes the following aspects:(1)A series of targeted transformations have been made to the native HDFS system for the centralized storage of massive small files.After deciding on the new architecture to relieve the cluster master node's single-point pressure on the cluster master node due to the excessive amount of file metadata when storing large amounts of small files,the transformation process mainly includes The transformation of the data storage method changes the mechanism that the metadata in the native system is only stored in the master node.Each storage node stores its own node's internal metadata in the node memory,and only a small part of the hotspot metadata is stored in the main node memory.The full amount of metadata in the file system is persisted on its disk,and a cache strategy and synchronization strategy for metadata are designed.In terms of data block transformation,the process of data block creation and recovery in the file system was transformed in a targeted manner,which maximized the efficiency of system disk usage.(2)After completing the targeted architecture transformation of the native system,a systematic multi-level data caching mechanism was designed to improve the performance of the distributed file system in the process of providing data read and write services.This part mainly includes the selection of caching strategy,and the design of caching mechanism for the multi-level data caching resource pool in the system based on the choice of distributed independent caching.On the basis of determining the basic scheme of the system cache,the internal hot and cold data replacement strategy in the cache and the adaptive adjustment design of the cold data cleaning and asynchronous writing process are completed.(3)The final comparison experiment with the native system proves that the design of the new architecture in this article can relieve the single-point pressure of the master node and ensure the functionality of massive small file storage,and the read and write performance of the system for small file scenarios On the one hand,it surpasses the native system.
Keywords/Search Tags:Massive small files, HDFS, Distributed independent cache, Adaptive adjustment
PDF Full Text Request
Related items