The Research On Storage Of Massive Small Air Cargo Files Based On Hadoop

Posted on:2019-04-29

Degree:Master

Type:Thesis

Country:China

Candidate:W X Li

Full Text:PDF

GTID:2348330569488281

Subject:Computer Science and Technology

Abstract/Summary:

PDF Full Text Request

With the vigorous development of the e-commerce industry,the scale of logistics data is also growing.Traditional storage methods have been difficult to deal with today's massive air cargo files.As a result,more and more companies are turning their attention to new storage methods,Hadoop distributed file system.Air cargo data mainly refers to various types of message data generated in the air cargo system,involving data during data exchange.Air cargo data is mostly in the form of massive small files.The file size is generally within 10 KB,so it is called Small files for air cargo.However,because the Hadoop distributed file system HDFS is mainly designed to solve the problem of storing large files,when storing large amounts of small files,it is faced with many problems,such as low storage efficiency,large memory consumption,and excessive consumption of NameNode resources.Therefore,researching the storage method of Hadoop distributed file system for small air cargo files is one of the problems that need to be solved in the air cargo field.In order to solve the storage problem of massive small air cargo files in HDFS,this paper analyzes the key data and characteristics of aviation logistics,and process a Hadoop based air cargo large-scale storage solution.The main contents are as follows:(1)An air cargo small files merging algorithm based on file association features,which includes the data preprocessing process of small files,the extraction of key data,and calculate the correlation between small files.The algorithm based on the correlation between small files to combine them,improved file storage and access efficiency.(2)A scheme of caching and prefetching for air cargo data,which establishes a file index structure,a metadata cache,and optimizes the cached data,improve the speed of massive small files reading,which used Redis memory database as the cache database to reduce the pressure of NameNode and promote the overall performance of Hadoop distributed file system.Through experiments and analysis,it is verified that the proposed merge algorithm in this paper is more obvious than the native HDFS and SequenceFiles storage solution in storage time and the pressure of NameNode.And verified the scheme of cache prefetchingoptimization is much more better than the native HDFS and SequenceFiles storage solution in reading time of air cargo files.

Keywords/Search Tags:

Air Cargo, Massive Small Files, HDFS, PageRank, Cache

PDF Full Text Request

Related items

1	Research And Design Of Massive Small Files Merging Based On Hadoop
2	Research On Storage Strategy Of Massive Small Files Based On HDFS
3	Research And Implementation Of Mass Small File Storage System Based On HDFS
4	Research And Optimization Of Mass Small Files Based On HDFS
5	The Research And Implementation Of Storing Massive Small Files In Cloud Storage
6	Optimization Of Small Files Accessed Base On MapFile In HDFS
7	Research On The Optimization Of Small Files Processing And Replication Strategy Based On HDFS
8	Reading And Writing Strategy Research Of Massive Small Files Based On HDFS
9	A Strategy To Deal With Massive Small Files In Hadoop Distributed File Systems
10	Research On Efficient Storage Of Small Files In Mobile Ultrasound Detection Based On HDFS