Font Size: a A A

The Research On Storage Of Massive Small Air Cargo Files Based On Hadoop

Posted on:2019-04-29Degree:MasterType:Thesis
Country:ChinaCandidate:W X LiFull Text:PDF
GTID:2348330569488281Subject:Computer Science and Technology
Abstract/Summary:PDF Full Text Request
With the vigorous development of the e-commerce industry,the scale of logistics data is also growing.Traditional storage methods have been difficult to deal with today's massive air cargo files.As a result,more and more companies are turning their attention to new storage methods,Hadoop distributed file system.Air cargo data mainly refers to various types of message data generated in the air cargo system,involving data during data exchange.Air cargo data is mostly in the form of massive small files.The file size is generally within 10 KB,so it is called Small files for air cargo.However,because the Hadoop distributed file system HDFS is mainly designed to solve the problem of storing large files,when storing large amounts of small files,it is faced with many problems,such as low storage efficiency,large memory consumption,and excessive consumption of NameNode resources.Therefore,researching the storage method of Hadoop distributed file system for small air cargo files is one of the problems that need to be solved in the air cargo field.In order to solve the storage problem of massive small air cargo files in HDFS,this paper analyzes the key data and characteristics of aviation logistics,and process a Hadoop based air cargo large-scale storage solution.The main contents are as follows:(1)An air cargo small files merging algorithm based on file association features,which includes the data preprocessing process of small files,the extraction of key data,and calculate the correlation between small files.The algorithm based on the correlation between small files to combine them,improved file storage and access efficiency.(2)A scheme of caching and prefetching for air cargo data,which establishes a file index structure,a metadata cache,and optimizes the cached data,improve the speed of massive small files reading,which used Redis memory database as the cache database to reduce the pressure of NameNode and promote the overall performance of Hadoop distributed file system.Through experiments and analysis,it is verified that the proposed merge algorithm in this paper is more obvious than the native HDFS and SequenceFiles storage solution in storage time and the pressure of NameNode.And verified the scheme of cache prefetchingoptimization is much more better than the native HDFS and SequenceFiles storage solution in reading time of air cargo files.
Keywords/Search Tags:Air Cargo, Massive Small Files, HDFS, PageRank, Cache
PDF Full Text Request
Related items